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ABSTRACT 


Determining the best method for granting World Wide Web (Web) users access to 
remote relational databases is difficult. Choosing the best supporting Web/database link 
method for implementation requires an in-depth understanding of the methods available and 
the relationship between the link designer’s goals and the underlying issues of Performance 
and Functionality, Cost, Development Time and Ease, Serviceability, Flexibility and 
Openness, Security, State and Session. 

This thesis examined existing methods for enabling Web-client access to remote 
relational databases and found that most fall within the general categories of Common 
Gateway Interface scripts. Server Application Programming Interfaces, Web-enabled 
Database Management Systems exporting query results in Hypertext Markup Language, and 
independent client-based processes such as Java applets. To determine the best database 
access category we compared each one to the underlying link issues and conducted a case 
study for the IEEE Signal Processing Society. 

The results of this thesis are: (1) a taxonomy of existing WebXdatabase linking 
methods, (2) a thorough listing and examination of the underlying issues as they relate to 
each link method, (3) recommendation and specification of the proper link method and 
hardware/software support system for the case study linkage problem. 
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1. INTRODUCTION 


A. OBJECTIVES AND RESEARCH QUESTIONS 

The main objective of this thesis is to exam the issues surrounding linkage of a 
database management system (DBMS) to the World Wide Web (Web) in support of dynamic 
data access. All DBMS’s in the DBMSAVeb linkage discussion are assumed to follow the 
relational model (still the most widely used)[Plain96] unless otherwise stated. Morever, only 
currently available and utilized methods, protocols and standards are examined and thus, 
those links requiring radical changes to the Web or the DBMS’s themselves are not covered. 

Secondary objectives supporting this thesis include an investigation, analysis, and 
creation of a taxonomy of existing dynamic DBMS linking methods, a discussion of related 
issues and tradeoffs, and the design of a solution to a prototypical case study. The case study 
offers the opportunity to further investigate the alternative l in k methods, weigh them against 
the related connectivity issues, and illustrate the decision processes relevant to designing and 
implementing a Web/DBMS link. 

B. THESIS ORGANIZATION 

Chapter n introduces the database and Internet concepts which serve as the 
foundation to the discussions in this thesis. Chapter El surveys common Web/DBMS link 
software components and discusses how they work together in building links. Chapter IV 
lists link related issues and analyzes required tradeoffs between them which underlie every 
link approach. Chapter V introduces and analyzes a case study while Chapter VI redefines 
the case study problems in terms of user needs and functional requirements. The chapter 
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then discusses the potential use of the specific linkage approaches previously compared and 
concludes by outlining the actual design for implementing the most appropriate of the 
potential approaches. Chapter Vn contains the thesis summary and conclusion, an analysis 
of data obtained, and thoughts for future work and potential applications. Appendices 
summarize two separate discussions between the thesis author and case study personnel. 
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II. BACKGROUND 


A. A HISTORY OF DATABASE MANAGEMENT SYSTEMS 

The term database [Rob95] refers to both the electronic storage of data (raw facts) 
in such data structures as linked lists, trees and hash tables, and the associated meta-data or 
data about the raw facts. A DBMS [Rob95] includes the database plus the combination of 
programs controlling access and allowing manipulation of the data. Generally, DBMS’s are 
categorized by the functions supported and the architecture with which they maintain data 
relationships. Functionally, DBMS’s range from the local, stand-alone DBMS’s found on 
personal computers and mainframes to the distributed, shared DBMS’s supporting the large 
numbers of multiple users and simultaneous queries commonly associated with large data 
processing applications. 

1. Evolution of DBMS Architectures 

Beginning with the use of flat-files in the 1950's and continuing with the relational 
and object-oriented DBMS’s of today, DBMS architects have sought to provide robust 
solutions to ubiquitous data storage and manipulation needs [Burleson94]. As time passed, 
each subsequent method brought additional functionality and solutions to the problems or 
constraints associated with its predecessors. 
a. Flat-files 

Flat-file systems were the original DBMS [Burleson94]. Primitive direct data 
access methods used by these early systems included simplistic sequential searching of sorted 
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listings or, at best, hashing algorithms that mapped files to specific disk or tape storage 
addresses. Problems associated with these methods included an inability to recognize, 
establish, or manage data relationships, the development of “information islands” within 
organizations as different departments stored duplicate information in different types of file 
structures written in different languages, and the tight coupling of data and programs that 
always led to maintenance, concurrency, and recovery problems. 

b. The Relational Model 

To address flat-file system problems. Dr. E. F. Codd, a researcher at IBM in 
the 1960's, developed the relational database model in which data resides in two-dimensional 
tables representing entities such as Customers, Inventory, Airplanes, etc [Burleson94]. Each 
row, or tuple, in the table holds data pertaining to one particular entity instance (Susy, 
hammer, B-52, etc). Additionally, one column in the table contains entity-unique data called 
a primary key that has two purposes. First, it serves to uniquely identify specific tuples 
within the table, or; secondly, when the primary key column is redundant to two or more 
tables, it provides links, or relationships, between the tables that can be navigated and 
manipulated with ease. 

Improvements offered by relational databases included a simple conceptual 
framework, data independence in which data relationships were no longer hard-linked by the 
supporting data structures, control of relationship cardinality via the enforcement of 
constraints upon addition and deletion of data, and declarative data access u tilizin g relational 
set theory eneapsulated within the mechanics of structured query language (SQL). SQL 
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initially offered only three classes of operators [Burleson94] SELECT, which shrunk the 
tables vertically by eliminating rows; PROJECT, which shrunk the table horizontally by 
removing columns; and JOIN, which dynamically linked two tables together via common 
column values. 

Problems associated with the relational model include the inability to store 
business rules that constrain and affect the behavior of data, and persistent difficulties in 
handling recursive relationships. 

c. The Object Model 

The Object Model of the 1980's and 1990's seeks to address the relational 
model’s drawbacks and, by utilizing the properties of encapsulation, abstraction, and 
polymorphism, to more effectively model and replicate relationships in the real world 
[Burleson94]. By storing the behavior of the data within the DBMS alongside the data and 
their relationships, object technology promises to be less maintenance-intensive since all 
changes to data attributes and behaviors are handled within the DBMS instead of by the 
programmer. 

2. Interconnecting DBMS’s 

As the amount of stored data increased, the need arose for different entities in 
different geographic locations to share data in a collaborative fashion. It was at this point 
that the limitations of monolithic DBMS’s became apparent, leading DBMS architects to 
create the initial distributed DBMS’s of the 1970's which later evolved into the inter¬ 
connected networks of distributed DBMS’s in use today. 
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a. Essentials of a Distributed DBMS 

All successfully interconnected DBMS’s share certain characteristics. At the 
highest level, distributed DBMS’s should easily handle geographic dispersion and 
heterogeneous hardware/software. To be ideal they must also include the following traits 
[Burleson94]: 

• local autonomy, meaning all data in the distributed network is owned and 
managed on a local basis independent of the overall system. 

• no reliance upon a central site, i.e, all sites are equally “remote” with no one 
site having governing authority over another’s data dictionaries or security. 

• continuous operation supporting seamless, around-the-clock access. 

• location independence, since end-users don’t care or need to know about the 
physical location of DBMS’s comprising the system. 

• hardware/operating system independence with no queries tied to specific 
proprietary systems. 

• network independence where network protocols are invisible to the end-user 
and differences in topologies are accommodated. 

• DBMS independence allowing non-DBMS-specific retrieval and update 
operations. 

• data integrity management of concurrent updates, ensuring the ability to 
handle transaction-type computing. 
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b. Client/Server 

Because of the similarity between desired traits for distributed DBMS’s and 
those of the client/server paradigm, today’s distributed DBMS products predominantly use 
client/server [Orfali94] to link decoupled entities just long enough for them to accomplish 
pre-specified units of work. Client/server (Figure 1) is predicated upon two independent and 
autonomous processes working together (usually over a network) [Orfali94]; client processes 
which request specific services, and server processes which respond to the requests. 



Figure 1. Client/server architecture 


Client/server seeks to optimally distribute these processing activities via 
many-to-many relationships over different computer platforms, using the best characteristics 
of each platform to the most advantageous end. The anticipated benefit of client/server is 
the ability to abstract hardware and software concerns and focus on developing and building 
user-friendly, cost-effective systems. 


7 





All client/server systems have the following distinguishing characteristics 

[Orfali94]; 

• service: client/server primarily involves a two-process relationship with the 
server on one machine providing services to the client which is (most 
probably) running on a separate machine. 

• shared resources: one server can service many clients at the same time and 
regulate their access to shared resources. 

• asymmetrical protocols: clients always initiate the many-to-one relationship 
by requesting a service from the passively waiting server. 

• transparency of location: the server process can reside on either the same 
machine as the client or on a different machine across the network. 

• message-based exchanges: clients and servers interact via a loosely-coupled 
message passing mechanism. 

• encapsulation and integrity: the server uses centrally-maintained code to 
perform the services in a manner transparent to the client. 

• scalability: client/server systems can be scaled horizontally by adding new 
clients, and vertically by migrating to newer, faster and more machines. 

• tier: client/server implementations can be classified by the location of the 
split between user interface, application logic, and data repository functions. 
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c. Tier 

The three basic client/server functions of data reservoir, user interface, and 
application logic are predominantly classified and divisible into two or three tiers 
[Finkelstein96]. Two-tier client/server systems are recognized by the division of the three 
DBMS’s functions between the client and only one server. In other words, all logic not 
actually on the client is co-located on the server. Two-tier implementations are themselves 
divisible into fat and thin client implementations [Orfali94], with/at clients holding both 
presentation logic and application logic and thin clients simply having presentation logic 
alone. 

With three-tier implementations a thin client presents DBMS interaction 
results while residing separately are the data reservoir on a data server and the application 
logic on the application server. The benefits of spreading the three functions among multiple 
tiers include [Centura96] faster performance because complex application logic can be 
executed on more powerful machines, and the improved ability to support end users by 
adding additional software or hardware to improve performance in those sites sustaining 
large numbers of simultaneous queries. 

B. EMERGENCE OF THE WORLD WIDE WEB 

The World Wide Web, also known as the Web or WWW [Marshall95], is a 
client/server implementation comprised of all the inter-networked {sic Internet) software and 
hardware information-retrieval components required to for around-the-clock access by 
anybody to any available information, located anywhere in the world. The WWW was 
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originally developed at CERN in March 1989 to facilitate information dissemination and 
sharing between collaborative groups across the world. At that time most users worked for 
the government or large universities and information was available primarily via anonymous 
File Tranter Protocol (FTP) using a simple line-mode browser. In January of 1993, access 
was made much easier with release of the alpha version of the X-MOSAIC browser and 
WWW use exploded to the high levels evident today. 

1. Web Architecture 

As previously stated, the architecture of the Web simply represents another step in 
the evolution of client/server technology. Distinguishing features of the Web client/server 
version include the extreme thinness of clients utilized simply to render documents for 
viewer consumption and the sheer number and variance in platforms participating in the 
network. Evident from the beginning, though, is that many of the Web’s underlying 
technologies directly mirror the desired traits for distributed DBMS implementations. 

a. Client/User-Inteiface Architecture 

From the user point of view the Web is a series of documents written in 
hyper-text markup language (HTML) [Schulzrinne96]. A Web document, also known as an 
HTML document, consists of special “tags” embedded in the document text which direct the 
user interface application in rendering the document into usable format. 

At the client end is the browser, a universally supported, ultra-thin user 
interface that supports the user view of the Web as a series of pages to be quickly rendered 
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into a digestible format. A special attraction of the browser is that it is easily and 
inexpensively deployed to very large numbers of users. 

Web pages are connected in many-to-many relationships by universal 
resource locator (URL) [Schulzrinne96] specified hyperlinks exemplified by the fictional 
URL “http://www.nps.navy.iml/~asdeaii/index.html.” A hyperlink contains all data 
necessaiy for the browser to locate and render the document since it indicates the connection 
method {hyper-text tranter protocol, e.g. http), the server’s name (www.nps.navy.mil), the 
resource’s name (~asdean/uidex.html) and the resource data type (.html). Resources named 
“index.html” usually indicate the homepage [Lynnworth96], a term referring to a Web site’s 
initial visitor starting page that includes welcoming data and hyperlinks leading to other 
pages located at the site or elsewhere in the Web. 

b. Server/System Architecture 

From the system viewpoint the Web consists of a world-wide network utilized 
to briefly connect clients to multitudes of servers, each of which is holding HTML 
documents made available for access. The network supports Web client/server 
implementations by establishing a connection between the requesting application (browser) 
and the responding application (Web server) when users click upon URL’s. During the 
connection, the server receives information from the browser regarding connection method 
(FTP, HTTP, etc.), resource URL code, resource data type, and client-specific browser 
type/model data. It then responds to the request with a protocol version status line, resource 
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access success/error code, relevant entity meta-information and the appropriate HTML 
resource, and then closes the connection. 

2. Standard Internet Protocols 

In the Internet protocol hierarchy [Stallings94] sho\vn in Figure 2, a client request 
flows down from the browser application to the application layer, then on to the transport 
and Internet layers till it reaches the network access layer just prior to crossing the physical 
infrastructure. 

To ensure client requests work properly, the browser must work with the host 
machine’s operating system and connection hardware to ensure client requests are in proper 
format and to correctly process each server response. 

HTTP (1.0) [Schulzrinne96], as the application layer protocol, determines the 
beginning, duration, and ending of a single communication session. For this reason, a solid 
understanding of HTTP’s stateless nature and its ultimate effect on Web communications is 
absolutely vital to this discussion of Web/database connectivity solutions. An analogy for 
stateless communications would be a telephone conversation where the called party hangs 
up after answering each of the caller’s questions, and when called again, has no recollection 
of previous answers and conversations. Because of statelessness, each client/server session 
must stand alone as a complete entity unless special measures are taken to retain records of 
the transaction somewhere in the loop. 
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Below the application layer, and above the network card or modem, is the 
Transmission Control Protocol/Intemet Protocol (TCP/IP) [Jordan94] transportation layer 
and the Internet layer that work together to facilitate all WWW traffic by abstracting 
individual network hardware protocols. TCP/IP is characterized by connection-less (no- 
dedicated conduit) conGununications where each message is composed of multiple uniformly- 
sized packets of data, each of which making its way alone to and from the server. When the 
packets arrive and are re-assembled at the server, they allow reconstruction of the original 
message. This process works the same in reverse when the server application sends its 
response back to the client. 
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3. 


Current Web Status 


Estimates of current Web usage vaiy, but Informix Corporation [Informix96 ]holds 
that there will be over 200 million Web users by the year 2000 and that the Web-related 
software market will grow from $260 million in 1995 to $4 billion by the end of 1996, a 
sixteen-fold increase. Along with this explosive growth in use is a corresponding revolution 
in content since Web pages have changed from static representations of archived files to 
extravagant multimedia productions. These newer sites allow intuitive user navigation of 
Web sites and offer such innovative mixed-media combinations of color, sound, and 
animation that sites not providing this level of service tend to be avoided [Informix96]. 

C. CONVERGENCE OF DBMS’s AND THE WEB 

DBMS products have now demonstrated the capability of managing mixed-media in 
a collaborative workflow environment [Frank95]. Vendors are selling rapid application 
development (RAD) tools incorporating pre-written question/answer “wizards” that abstract 
underlying technical details and guide developers through the development process. These 
tools, which are common for either monolithic or LAN-based client/server environments, 
have not completely solved all the problems associated with the uneasy linkage of proprietary 
versions of SQL and other protocols and so they limit DBMS’s ability to cheaply scale up 
to the inter-networked level of distributed computing. It is because of this that DBMS’s need 
to find an easy method of disseminating rich content on a massive scale without the high 
costs associated with proprietary solutions. 
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From the Web perspective, it is content that is king, and site builders have irrevocably 
moved from presenting static text to allowing the user to interact with rich, dynamic and 
fluid multimedia sites. Interactive [Stormcloud96] in this sense means collecting and 
processing user input that is more complex than simply choosing from among a set of static 
links. Throughout the world and throughout the day, a growing number of users demand 
interactive access with photographs, graphics, text, audio and video, all of which must be 
archived for real-time retrieval and editing. The problem with this demand is that traditional 
HTTP servers are incapable of supporting this level and type of interaction [Rowe96], let 
alone facilitating quick, easy, and innovative customization of content. 

Given the current and projected levels and methods of Web use, the marriage of 
DBMS’s and the Web is a natural one. The Web is a compelling platform for database- 
driven interactive applications because it offers platform independence and instant global 
access along with the possibility of significantly lowering development, deployment, 
training, maintenance, and upgrade costs. Concurrent with DBMS products handling new 
data-types and developing new distributed processing capability is the Web-driven demand 
for increasing quantities of these new data-types along with new, richer ways to manage and 
interact with them. 

The bottom line is that DBMS’s need for low-cost, widespread connectivity and the 
Web’s need for robust tools to manage dynamic interaction with multimedia content has 
caused the two technologies to converge. Key to this convergence is discovering and 
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creating a link between the two that leverages their respective strengths without unduly 
restricting each other’s ability to address and resolve important issues. 
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III. COMPONENTS IN THE WEB/DBMS LINK 


A. INTRODUCTION 

A Web/DBMS link component is any identifiable (e.g., self-contained and/or 
standalone) software program utilized to dynamically generate data. These components, 
though wide-ranging in complexity and purpose, all have the common goal of enabling 
Web/DBMS linkage. 

This chapter begins with an explanation of the basic data access cycle common to all 
approaches and an introduction to a fictional database that will be used throughout the 
chapter for illustration purposes. The following sections explore a taxonomy of components 
commonly used to build Web/DBMS links. The taxonomy is grouped into client, server and 
middleware categories and we will provide a brief discussion of the mechanics of each one. 
We conclude the chapter with an introduction to the main Web/DBMS approach categories 
that we further discuss in Chapter IH when the issues surrounding Web/DBMS links are 
more fully explored. 

1. The Data Access Cycle 

All Web/DBMS links have the common data access cycle since, regardless of which 
software products are used to build a Web/DBMS link, all approaches fundamentally rely 
upon a message-based form of interaction called remote procedure calls (RPC) [Orfali94] 
for data access. As demonstrated in (Figure 3), on the calling (client) side of the RPC is the 
browser serving as the graphical user interface (GUI) that accepts user inputs and renders 
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HTML documents. Each remote call commences when the user clicks on a URL displayed 
by the browser, opens a connection with the remote server site via middleware 
communication processes, and sends a request for data. 

The remote server, which listens passively for client connection calls and data 
requests, completes the cycle by parsing, interpreting and acting upon the request, encoding 
the response in HTML for browser rendering, sending the results back to the client via 
middleware, and finally closing the connection. 



2. Example Database 

To illustrate how respective eomponent characteristics and mechanics are used to 
execute the data eycle, a relational database comprised of two tables will be used throughout 
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this chapter. The database pertains to a fictional campus food stand called “THAI SHACK” 
that is owned and operated by its owner “Ralph Thai.” Ralph manages his menu and 
customer lists with a Web-linked database containing “MENU” (Table 1) which tracks 
menu-item ingredients, and “CUSTOMERS” (Table 2) used to track customer menu 
preferences. For both tables the primary key is “Name.” 


Name (key) 

Gender 

Graduation-Date 

Favorite 

Suzy 

Female 

Mar 1997 

Tiger Cry 

Dewey 

Male 

Oct 1998 

Spicy Soup 

Gidget 

Female 

Dec 1998 

Red Curry 


Table 1. CUSTOMERS 


Name (key) 

Meat 

Coconut 

Hot Peppers 

Tiger Cry 

Beef 

No 

Yes 

Spicy Soup 

Chicken 

Yes 

Yes 

Red Cuny 

Chicken 

No 

No 


Table 2. MENU 


B. CLIENT COMPONENTS 
1. Browsers 

In the simplest method of Web/database interaction the tasks of a “standard” Web 
browser are limited and the data access cycle begins with the user clicking on a URL to 
render fill-in-the-blank “forms.” The forms capture the user input for such pre-fabricated 
database queries as the following SQL pseudo-code: 
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SELECT Names 

FROM CUSTOMERS 

WHERE Favorite = “X” 

The cycle continues when the user selects a specific Favorite (for example “Red 
Curry”), and clicks on a submit button to send the data request to the server. At the server 
the “X” is replaced with the selected favorite, the query is executed, and the results are 
formatted in HTML. The cycle is complete when the browser renders the HTML-formatted 
data returned by the server. 

Advantages of using standard browsers include simple, standardized functionality and 
universal Web server support without the additional costs associated with proprietary 
software. A major limitation in relying upon standard browsers (as opposed to fat clients, 
discussed next) is the dependence upon the pre-fabricated queries stored on the server, and 
the large number of network accesses required to receive client input, return error messages 
for incomplete or inaccurate entries, and then respond again with the correct query results. 
a. Plug-ins and Fat-clients 

In an attempt to limit the number of network accesses, tailor the query process 
for each user, and reduce server workload, some developers have created variations on the 
standard browser. These variations, called browser plug-ins, rely upon the browser knowing 
about a server-side database’s existence and interacting with it in some proprietary fashion 
to fetch and interpret results on the client side. Plug-in’s [Netscape96] are software modules 
that extend the standard browser’s capabilities. Plug-in’s accomplish such tasks as guiding 
the user through the query process or performing rudimentary business logic with respect to 
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form completion. They also ensure data type/query language compliance prior to actually 
submitting the request to the network. An example would be if a THAI-SHACK database 
user input “Susy” in a fill-in form field supporting a query based upon Graduation Date. The 
plug-in would know that “Susy” is not an alphanumeric and could provide immediate 
feedback to prevent the erroneous database query from crossing the network. 

Fat-clients take browser variations to the extreme and are exemplified by the 

classical client/server distributed DBMS, whereupon the DBMS runs on the client side and 

needs no WWW support other than to provide a connection to the server-side data reservoir 

[Orfali94]. An example of this situation is when an entire query similar to the following 

SQL pseudo-code: 

SELECT Names 

FROM MENU 

WHERE Meat = “CHICKEN” 

is created on the client side, checked for syntactical and data-type errors, and then a fetch 
routine listing specific tuples to be gathered is sent across the network to the server-side data 
reservoir. These fat-clients, or database front-ends, usually require proprietary licensing 
agreements and are restricted in their capability to work with anything other than vendor- 
specific products except for lowest common denominator middleware products 
[DataRamp96] like ODBC (discussed in the middleware section of this chapter). The 
advantage to using fat-clients is that DBMS/user interaction no longer depends upon pre¬ 
formatted queries and the supported functionality approaches that of a stand-alone database 
on a PC. The disadvantage with fat clients is that they routinely cost more, are problematic 
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with respect to upgrades and maintenance due to wide dispersion of multiple copies. 
Additionally, fat clients can, require users in some cases to have a high degree of familiarity 
with database specifics and queiy languages. 

b. Client Record Storage 

Some browsers incorporate the ability to store records in client permanent 

memory. When users click on a URL, downloaded pages are stored (cached) permanently 

in client memory so subsequent accesses only require the downloading of updates. The 

usefulness of caching is inversely proportional to the richness and interactive/dynamic nature 

of page content since, unlike static text pages, those pages containing dynamically generated 

information such as links to other URL’s or multimedia files that were generated in response 

to queries do not readily lend themselves to caching. This is true because the data that was 

gathered and used to build the page may have been changed or deleted subsequent to the last 

access. Utilizing the example database, the results of the SQL pseudo-code: 

SELECT Names 

FROM CUSTOMERS 

WHERE Favorite = “Red Curry” 

once cached would not change from one page load to another despite database entries which 
may have changed relevant information held at the server. 

To counteract the static nature of caching, browsers can store, along with the 
page itself, another record type commonly known by the Netscape Corporation’s term 
“cookie.” Cookies [Netscape96-2] record such data as time stamps or other indications of 
previous user input, and whenever the users make subsequent requests, the information 
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written in the cookie is transparently sent to the server to facilitate interactive responses. 
With respect to the example database, the browser could compare the date and time of 
cached Red Curry query results against a server-supplied time of last database update and, 
if cached results have been superseded, new results could be generated. This guarantees that 
users are always presented with information reflecting the most recent database state. 

2. Independent Processes 

Essentially a hybrid plug-in, independent processes are client-resident entities 
working within the browser to dramatically extend client functionality in the Web/DBMS 
link. Despite their dependence on browser hosting, these processes are considered 
independent since they are self-contained and not part of the browser itself. 
a. Java Applets 

Created by Sun Microsystems, Java applets [Hamilton96] are small, self- 
contained, object-oriented, platform-independent, multi-threaded processes downloaded by 
the client browser to deploy application logic services on the client side of the connection. 
Applets come over the Internet in partially-compiled Java-specific binary code that is fully 
interpreted upon download so they can immediately start mnning on a virtual machine inside 
the browser. The virtual machine [Hamilton96] is a software process residing inside the 
browser application’s memory space that emulates the instruction set of specific computer 
hardware. Since the Java byte code is interpreted instead of compiled, and the virtual 
machine is a plug-in, they offer the benefit of complete portability. In other words, the 
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specific type and instruction set of client hardware is irrelevant since the applets run on the 
virtual machine inside the browser instead of on the client host. 

Applet hosting on the client computer can solve many of static HTML’s 
limitations. This is especially true for database applications since applets enable database 
servers to concentrate on serving data while the independent process authenticates users, 
provides business logic such as data-range checking or form completion, and then obtains 
data from the DBMS on the server. 

Returning to the example database, an applet would execute upon download 

to capture a user’s data request and error-check such complex tailored queries as the 

following SQL pseudo-code: 

SELECT Names 

FROM CUSTOMERS 

WHERE 

SELECT * 

FROM MENU 

WHERE CUSTOMER.Favorite = MENU.Name 

and 

Meat = “Chicken” 
and 

Graduation Date’s < “Red Curry”(erroneous input) 

When the applet error-checks the request for data type and SQL syntax 
agreement, it notices that Red Curry is not a proper entry for the Graduation Date column. 
The applet then informs the user to make changes, and once Red Curry is changed to a proper 
date such as “Dec 1997," the applet builds a JOIN between the CUSTOMER and MENU 
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tables, sends the data request to the server, and aids the browser in rendering the returned 
output. 

The primary benefits of applets are portability and the savings in lag-time 
associated with conducting all business and application logic prior to network transmission. 
The portability benefit comes at the potential expense of a 30-fold speed deficit to compiled 
processes [Yourdon96] since there is an additional layer of processing required to translate 
between virtual and actual host machines. 

An additional downside associated with applets concerns security and the fear 
of malevolent processes potentially accessing client files and hardware. These possible 
security infractions include [Dean96] stealing passwords, copying or destroying files, spying 
on client hardware and software configurations, attacking flash BIOS, or copying more and 
larger malicious programs from Internet-connected computers. 

b. Active-X 

Active-X [Murdoch96] is very similar to Java in concept and, as an 
evolutionary technology based on object linking and embedding (OLE) technology, is 
Microsoft’s approach to thin-client Internet computing. On a Web page, Active-X can 
accomplish a wide variety of tasks and extend normal HTML coding by incorporating into 
Web applications small object-oriented programs called “custom controls” [Microsoft96]. 
These controls can seamlessly handle repetitive and/or complicated functions such as menus, 
scroll bars, buttons, pre-loaders, and timers. The tme power of these controls, though, comes 
from their ability to perform traditional server-side functions by downloading at runtime 
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functional modules from the network and contacting other hosts outside of the client/server 
link. Active-X is envisioned by Microsoft [Murdoch96] to be a method of standardizing web 
site development and making data available transparently no matter where it is located. On 

the downside, there is a strong security concern with Active-X that mirrors the concerns with 
Java. 


C. SERVER COMPONENTS 

1. The Relational Database 

The relational database is the basis for most Web/DBMS links [Plain96] and it 
remains fundamentally the same regardless of whether it is a stand-alone or Web-connected 
application. Web/DBMS link applications differ in what the relational database can offer, 
and it is the implementation of SQL within the application that determines robustness of 
query requests and such additional functions as protection of data integrity, user 
authentication, or limited data access based upon multiple security levels. 


a. Extended HTML/Embedded SQL 

Some database applications utilize Web pages seeded with special HTML tags 
and the SQL code required to conduct pre-formatted database queries and create the 
corresponding responsive Web page [Frank96]. When a data request is received by the 
HTTP server, it utihzes a special program or set of macros to map specific extended-HTML 
form fields to corresponding place holders located within an SQL statement. The same 


program then submits the SQL query to the database server, formats the results in HTML, 
and releases the results to the Web server. 
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With embedded SQL [Frank96], it is the query itself and form fields that are 
sent by the browser. A special server-side parsing application strips an embedded SQL query 
and related data from the browser-generated input stream, executes the query, merges the 
results with the original or stored Web page, and then passes the output back to the Web 
server for HTML encoding and return to the client. Almost invariably the use of embedded 
SQL code requires extensions to standard HTML tags to signal the HTTP Web server that 
a database query must be conducted. 

With respect to THAI SHACK, a query similar to the following SQL pseudo¬ 
code: 

SELECT Names 

FROM X 

WHERE Y = “Z” 

would be stored in the server-side DBMS. The user would be offered the opportunity to see 
and select food made with different items, and if they selected “Coconut,” then X becomes 
MENU, Y becomes Coconut and Z becomes Yes. These entries are swapped into the correct 
place in the query statement, the query is conducted, and the results formatted into HTML 
and passed to the HTTP server for return to the client. 

Although various HTML extension and embedded SQL codes 
implementations are proprietary, and therefore limit the openness of the total solution, they 
are useful in customizing page content in response to a user’s identifier or computer Internet 
address captured by the Web server [Gundavaram96]. Additionally, they perform faster than 
applications requiring the execution of more complicated methods and can be easier for 


27 



developers to use since they dispense with requirements for writing, co mpilin gs and 
maintaining multiple programs. 


One note of caution with the use of HTML extensions is that Web pages 
containing proprietary tags may not be properly displayed by some browsers [McGee96]. 
This means users may miss some of the content, since only those pages conforming to the 
standards issued by the WWW Consortium (and do not contain proprietary extensions) can 
be universally read. 

b. Templates 

Another way to lighten developer workload is to reduce the amount of time 
spent writing the HTML code used to format database output. Some DBMS products are 
“Web enabled,” meaning both data and the programs for embedding HTML code into the 
data output are located within the database [Dobson95]. In operation, when the browser calls 
the HTTP server with its request, a stored program is actually called instead of a specific 
HTML page. The stored program executes a database query, reads in a generic template 
serving as the basis for the Web page, and populates areas marked by predefined tags with 
values retrieved from the query. Once the program has completed the request query and 
HTML formatting, it sends it to the HTTP Web server for shipment back to the browser. 

in the case of templates, the results of the previous Coconut query are 
individually mapped to specifically-marked places on pre-formatted pages stored in their 
entirety within the database. This allows additional flexibility in page presentation since a 
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query that results from inputting Yes under Coconut, for example, could map to a page with 
a palm tree motif, while the results of No under Coconut would not. 

Advantages of Web-enabled databases are the familiarity of the development 
environment, bypassing the use of scripting languages (discussed in the following section), 
and the simplification of web page organization since the web page logic, graphics, HTML 
encoder, and other resources are all stored together. The primary disadvantage of templates 
are their proprietary nature. 

2. Database Gateways 

The function of the basic HTTP server is to listen for incoming connection requests 
and either return a file or pass the request on to another application [Rowe96]. Since 
database queries are not included in these rudimentary services, developers have created 
interface programs to stand between the HTTP server and the database and facilitate the 
enhanced translation and communication capabilities. Approaches to these interface 
programs range from open interface standards called gateways to proprietary 
implementations known as application program interfaces (API’s). 

a. Common Gateway Interface 

Common Gateway Interface (CGI), called the “poor man’s client/server” 
[Vamey96], is an open interface standard enabling clients to initiate and pass information to 
server-based programs that generate responsive information in the form of dynamically 
created Web pages [Gundavaram96]. CGI’s key service is providing client access to 
information that would otherwise be unreachable without the Web server r unnin g these 
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external programs. With a simple file, the Web server’s job is to respond to the http request 
by returning a file and when CGI is involved, its additional responsibility is to generate the 
file’s content dynamically. As illustrated in Figure 4, clicking on the URL hyperlink 
executes a stored CGI program and begins the access process. 



Figure 4. The CGI Process 

It is the Web server itself that runs the program by forking a new process, 
passing client-submitted data to the new process, giving up CPU cycles to the new process, 
formatting the returned results in HTML, sending results to the client and then killing the 
process [Gundavaram96]. CGI is single threaded, meaning servers must create a separate 
process for each request received, with increased numbers of requests leading to increased 
numbers of processes. 

Sites utilizing CGI must provide guidance to the HTTP server and 
differentiate between static pages and those generated dynamically. One convention is to 
have static pages end with the “.html” suffix and dynamic pages end with “.cgi.” so that a site 
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would have a static homepage provide welcoming information, site “how-to’s,” and form 
fill-ins along with links to the other pages at the site offering more robust interaction. 

In the example case, a user desiring to “see the names of female customers 

who eat beef’ would utilize the static homepage “THAI-SHACK.html” to access a 

dynamically generated page called “THAI-SHACK-PROFILE.cgi.” Once the user completes 

the fill-in form and clicks on “Submit,” the browser assigns the values Gender=Female and 

Meat=Beef, sending the query input to the HTTP server which in turn spawns a CGI process 

and hands off the input. The CGI process would generate a query similar to the following 

SQL pseudo-code: 

SELECT Names 

FROM CUSTOMERS 

WHERE 

SELECT * 

FROM MENU 

WHERE CUSTOMER.Favorite = MENU.Name 

and 

Gender = “Female” 

Query results are then formatted into HTML by the CGI process and passed back to the 
HTTP Web server for transmission to the client. 

One benefit of using CGI is the large base of pre-written programs available 
for free from the Internet and other sources. Other benefits are extreme flexibility in 
providing page content, almost universal support among commercial servers (ergo, highly 
portable), and compatibility with almost any database application language. In fact, CGI 
programs can be written in any language that can create an executable for the given operating 
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system [Lynnworth96] despite the fact that most CGI programs are UNIX based and 
therefore the CGI programs are normally Unix shell scripts or Perl scripts [Yager96]. These 
scripts are unique in that they are runtime “interpreted languages” as opposed to compile 
time “compiled languages” that generate binary executable files in the host machine’s native 
hardware language code [Brenner96]. Care must be taken with respect to choice of 
languages since interpreted languages run more slowly than compiled languages and can thus 
slow server performance. Tediously created CGI applications can’t handle rich data types 
easily [Gaffney96] and they require the programmer to code in both HTML and the 
supporting scripting language to make processes that contains the database application logic 
required to execute the query. These processes offer poor performance due to the continual 
forking of a new process with each request and the repeated database opening/closing that 
overtaxes server CPU cycles. CGI is adequate for simple queries but is unable to handle 
rigorous compound transactions where the results of one operation are required to determine 
the next one. 

b. API’s and DLL’s 

One way around the performance penalty of open gateways and interpreted 
languages is to use native code (proprietary to specific vendors) to provide translation 
services. This natively compiled code, or API, provides the fastest way to access a database 
via the Web [Rowe96] because it does not need a run-time interpreter and because, for each 
request, it can load and run much faster than interpreted code and remain in memory if 
desired. An API works by providing a library of function calls, usually written in C or C-H- 
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[Rowe96], that allow a link programmer to make function calls on the database without 
knowing the specifics of the database implementation. 

An example API is the Microsoft dynamic link library (DLL). A DLL is one 
or more functions created from the ground up to provide run-time support to the server. 
DLL’s are compiled, linked, and stored separately from the processes that use them and are 
loaded (once) into the same memory space as the server. When a user clicks on a URL that 
requires the DLL, the server first checks to see if it is already in memory. If not, it is loaded 
and used, but if it is so, then there is zero overhead associated with loading. Because of this 
co-location in memory, DLL’s have access to all the resources available to the HTTP server 
and, since DLL’s only need to be loaded once and they avoid the added overhead of repeated 
loadings or request calls to external scripts, DLL’s gain a five to ten-fold increase in 
performance over CGI [Microsoft96-2]. 

Data access is accomplished in a manner similar to the CGI methodology 
discussed earlier. 

3. Providing Security 

Surrounding all data access methods is the need for security. This need transcends 
all components of the Web/DBMS link primarily because the Internet was originally created 
to support a benign network of mutually-supportive academic and government entities. 
Internet protocols are thus inherently insecure. Because of this, web servers and client hosts 
require Web/DBMS link components to additionally provide robust protection capabilities, 
but without overwhelming usability and performance requirements. 
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The most common security vulnerabilities associated with the Internet [Hughes96] 
include: 

• hazardous traffic: meaning traffic associated with any attempt by 
unauthorized users to enter and harm the Web site. 

• lack of confidentiality: the Internet protocols assume networked computers 
will only acknowledge traffic meant for them and ignore all other. 

• unidentified systems: since there is no reliable manner to permanently bind 
a specific computer to a specific IP address, it is difficult to reliably identify 
networked computers. 

• unidentified users: most user authentication schemes rely on a password that 
is passed in the clear and thus available for copying and re-use. 

• unidentified servers: it is difficult to reliably identify the actual server 
performing an application service. 

• modified data: because of the open nature of the Web, it is easy for 
malevolent entities to capture, copy and alter data packets prior to arrival at 
ultimate destination. 

Most Web applications address this situation by using server-side components to 
prevent the unwanted release of personal data or unauthorized data manipulation both at the 
server and en route to or from the client. 
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a. Firewalk and Proxy Servers 

For the Web site itself, the most common way to provide basic user 
authentication and authorization security procedures is with firewalls and specialized HTTP 
servers called proxy servers [Hughes95]. In their simplest form, firewalls are screening 
mechanisms that discriminate between valid and invalid users or processes depending upon 
passwords or client host Internet addresses. The successful [Hughes96] firewall utilizes the 
philosophy of “those not known to be a friend are considered foe and denied access.” 

Considering the example database, if Ralph wanted to ensure only he could 
make changes to MENU or CUSTOMERS, a firewall could be programmed to forbid access 
to anyone who did not access THAI-SHACK from Ralph’s computer and type his password 
in correctly. 

Proxy servers contain additional functionality extending basic firewall 
security by straddling the firewall to “focalize application traffic” [Hughes96] via processes 
that listen for and respond to client page requests coming over the Internet. Conventional 
use of a proxy server [Hughes95] would have it compare the user’s identity to a list of pre¬ 
approved users with passwords, chent TCP/IP addresses, or client host names against a pre¬ 
approved list on file in the server. 

Once the authentication and authorization service has been completed, the 
proxy server forwards the request to the database application for processing and then returns 
the query results via the HTTP server to the client. Proxy servers can also be used to perform 
such other duties as caching heavily used documents that don’t need to be created on-the-fly 
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or converting the database output into HTML, and supplementing audit capabilities by 
logging transactions as they occur. 

D. MIDDLEWARE COMPONENTS 

As shown in Figure 1, facilitating the interaction between client and server processes 
is middleware, a vague term covering all distributed software such as communication stacks, 
distributed directories, authentication services, remote procedure calls, and Queuing services 
[Orfali94]. Middleware is the “glue” that provides inter-operability between proprietary 
entity or process protocols that would otherwise be unable to communicate. What 
middleware does not include is the DBMS or other applications providing the actual service, 
the user interface or any logic that supports the application. 

Middleware can be divided into the two main classes of service-specific functions 
which enable a particular client-requested service and general functions such as 
communication stacks, the RPC’s, queuing services, and other services that comprise the 
enabling substrate for the service-specific functions [Orfali94]. 

Database middleware represents more than half of the entire middleware market 
[Freeman96] and can be either open or proprietary. Developers must therefore choose a 
particular middleware component based upon whether their link will be optimized for a 
specific application or standardized to provide “lowest common denominator” [Orfali94], 
or cross-vendor, access to multiple applications. In other words, to provide broad-based 
support middleware must forgo some of the performance and functionality enhancements 
that vendor-focused solutions are able to provide. 
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This section looks at the most prevalently utilized Web/DBMS link middleware 
components and discusses how they try to finesse the limitations of standard Internet 
middleware protocols and add the concepts of state and connection-oriented co mmu nications 
to database transactions. 

1. Protecting Data En-route 

If basic firewall authorization and authentication schemes don’t provide enough 
security, then encrypting the data prior to passing it over the wire may be required. Server 
Sockets Layer (SSL) [Hughes95], developed by Netscape, is an encryption scheme that 
provides a secure client and server connection along with guaranteed message authenticity 
and server identity authentication services. What SSL does is scramble the data at the 
application layer prior to passing it down to the network layer. This renders any intercepted 
packet unreadable by unauthorized readers. 

S-HTTP is a standardized form of the HTTP protocol [Hughes95 ] that also encrypts 
data above the network layer in order to guarantee data integrity, non-repudiation and user 
authentication. Working alone or together, SSL and S-HTTP and components like them are 
the most prevalent middleware method of protecting users. 

2. Common Object Request Broker Architecture And Internet Inter-ORB 
Protocol 

The basic Common Object Request Broker Architecture (CORBA) is a message- 
based middleware specification [Vinoski93] that provides high-level object-oriented support 
of applications inter-networking across diverse architectures and infrastructures. In 
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CORE A’s case the high-level support is an additional abstraction layer superimposed over 
underlying network and application layer protocols. CORBA is implemented with client- 
side and server-side skeletons via an “object-dot-method” nomenclature that allows 

a client object to call the procedures of an object located on the server. The server-based 
object responds by invoking the requested procedure and returning the results in CORBA 
speeified format. 

The Internet Inter-ORB Protocol (HOP) [House96] addition to the CORBA 
specification offers the advantage of having Internet-based CORBA server objects retain 
state and bypass HTML and CGI. This offers the advantage of reducing HTTP server 
overhead and extending database application functionality but comes at the cost of some 
extra CPU cycles used for messaging formatting, data-copying and parameter marshaling 
[Schmidt96]. It has been suggested that this may limit CORBA BOP’s use in high-speed 
applieations [House96], but for the average Internet connection, ease-of-use and data-type 
extensibility overcome any performance penalties. 

3. Open Database Connectivity 

Some vendor-specific middleware protocols are rapidly becoming de-facto standards 
for Web/database connectivity. These proprietary “standards” are utilized primarily in a 
three-tier architecture and stand between the database and the API or gateway to enable 
cross-vendor and multi-DBMS access by Web servers. 

Open Database Connectivity (ODBC), a proprietary (Microsoft Corporation) 
specification [Lambert96], is one standard that attempts to solve the problems associated 
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with accessing data from multiple proprietary databases in the heterogeneous Web/DBMS 
environment. As illustrated in Figure 5, the ODBC architecture has four major components: 



Figure 5. ODBC Architecture 


the data source, which can be almost any DBMS; the driver, which modifies application 
requests to conform to data source syntax; the driver manager, which loads specific data 
source drivers when needed; and the ODBC application, which performs functions required 
for submitting queries and retrieving results. 

Together these components provide a call-level interface (CLI) that allows Web site 
developers to make calls in a generic database interface language and have ODBC translate 
the call into specific database driver syntax. 

ODBC problems abound, with the most serious one being that the standard is 
controlled by Microsoft and thus constantly evolving. Furthermore, experience has shown 
that the ODBC driver/layer interfaces are difficult to build and maintain, introduce a lot of 
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overhead to simple SQL queries, and can never be as fast as native API’s that bypass the 
additional layers of any CLI [Orfali94]. 

4. Java Database Connectivity 

Similar to ODBC in nature, JAVA Database Connectivity (JDBC) is a specification 
and API of the JavaSoft division of Sun Microsystems [Linthicum96]. JDBC was built to 
give Java a database connection and also to be implemented as an additional layer on top of 
ODBC so it can work with ODBC-compliant databases without change. JDBC consists of 
a series of abstract Java interfaces addressing standard database query requirements and it 
uses a simple two-tier architecture (Figure 6) for applet-database connectivity in which 
applets executing within the browser environment connect back to a database server located 
somewhere else on the Internet. 



ODBC and 
Database Driver 



Figure 6. JDBC Architecture 
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The JDBC API connects the Java application to the JDBC manager, which in turn 
uses the Driver API to connect the JDBC manager to either a JDBC-specific database driver 
or to the ODBC database driver via a JDBC-ODBC bridge. In addition to traditional two-tier 
implementations, JDBC also provides three-tier access to database servers in which an applet 
can call a middle tier service layer (such as a CGI or API process) to make the call to the 
database server. By sitting on top of ODBC, JDBC is well positioned to provide database- 
neutral communication services, but it must address other issues before it can be considered 
a mature technology. These issues include database and host computer security, database 
recovery from interrupted or dropped transactions, and performance issues relating to the 
addition of another abstraction layer. 

E. COMBINING THE COMPONENTS 

After looking at the most common link components, a choice must be made regarding 
which of them to use when building a Web/DBMS link. The correct combination of client, 
server and middleware link components is ultimately in the eye of the builder, since the 
choice of which to use depends upon such questions as the stated purpose of the site, 
anticipated usage levels, available resources, and the emphasis each developer places on the 
different and competing link issues. 

Given the large number of components available, the numerical possibilities for 
combining them are quite high, but if an assumption is made that the client will utilize a 
simple browser with the possible addition of a widely used plug-in, then the number of 
possible combinations drops. In light of this assumption, closer inspection shows most 
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Web/DBMS link server implementations actually fall into one of the following general 
categories [Reichard96]: 

• CGI gateway 

• API (including ODBC) 

• Web-enabled DBMS’s 

• Independent processes 

• Other solutions 

Chapter IV introduces issues surrounding all Web/DBMS links and discusses the 
tradeoffs required between them when choosing one of these general link implementation 
approaches. 
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IV. WEB/DBMS LINK ISSUES AND TRADEOFFS 


A. MARRIAGE OF TWO TECHNOLOGIES 

The technology behind the Web/DBMS link basically represents a new marriage of 
old technologies. As stated in Chapter I, the WWW is a client/server implementation built 
upon insecure, connection-less, and stateless protocols, and most Web-connected databases 
are variations of the relational database model. Since each of these two technologies’ 
strengths and weaknesses are well understood, the novelty and excitement surrounding the 
Web/DBMS link results from connecting these technologies in manners at odds with their 
underlying precepts. Examples of this inherently non-symbiotic linkage include the usage 
of relational databases in an operating environment lacking total control of access method 
and duration and the use of native WWW protocols to support interactive sessions of long 
duration. 

The creation of a Web/DBMS link additionally involves deciding and committing to 
such mutually exclusive development paths as the use of tediously hand-written code 
specifically tailored to user needs, or the use of integrated RAD software packages that may 
entail limitations to flexibility or complexity. Other issues include considering whether the 
site should grant secure or secured access to remote users, the use of freeware instead of 
commercial products to save money, and the ability of normal site administrators to support 
the final link product instead of requiring the expense of hiring of skilled technicians. The 
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one constant throughout all link approaches is that each has its own advantages and weakness 
relative to these issues and so any approach chosen must involve trade-offs. 

B. OVERARCHING ISSUES 

1. Performance and Functionality 

Time delay between user input and system output is an issue that can be affected by 
such diverse factors as network load, server computational power, and limitations placed 
upon the number of simultaneous client accesses. Regardless of the cause, as delays increase 
more and more users begin to consider the delay excessive and start to seek alternative 
interaction methods [VisualWave96]. 

In the early stages of the Internet, high-performance content delivery was easily 
accomplished with quick and simple file servers that could also provide caching of heavily- 
used pages that, since they were static text, did not change from user to user. As the Web 
has evolved, though, server capabilities have had to change in response to the need to 
dynamically generate web pages and tailor page content for each user. These changes in 
server capabilities must still keep Web/DBMS link access time down, and do it without 
negatively impacting flexibility in design or content complexity. In other words, the ability 
to handle additional media and data types or unpredictable peak loads in usage must come 
without loss of usability or functionality. 

The DBMS itself can affect performance [Francett96]. In addition to the DBMS’s 
ability to support multiple simultaneous users, the DBMS must be able to generate pages 
comprising disparate and rich data types without perceptibly slowing. For this reason, some 
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corporations are choosing to provide only text-based data instead of richer data utilizing 
graphics or other such bandwidth and computation-intensive types [Francett96]. Finally, the 
machinations required to retain DBMS state information in the face of standard Internet 
protocols can also negatively impact site throughput. 

2. Cost 

These evolutionary changes in server capabilities must also be made with an eye on 
the resources required to implement the change. Since money is needed to buy hardware and 
software, create content, and maintain the site as it grows and reacts to changes in access 
levels or content, it is fair to say that funding availability affects a Web site throughout its 
life. If funds are limited, tradeoffs must be made between fund availability and the 
integration of solution approaches. For instance, a content provider looking for low-cost 
solutions could initially choose to piece together shareware and freeware components. With 
additional resources, though, it is possible to use the more expensive integrated Web/DBMS 
development and maintenance suites or even to coimnission highly expensive consultant- 
provided turnkey solutions. Basically, the more money that is available, the easier it is to 
rapidly field a complete solution. 

3. Development Time and Ease 

Competition is fierce among content providers trying to attract ever larger numbers 
of visitors, and so any Web site desiring to continually improve must have rapid development 
and revision ability. Given adequate funding, integrated RAD Web-server development 
products can be used to dramatically reduce site development time by offering “wizards” that 
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insulate developers from arcane scripting languages and quickly guide them through the 
creation of complex applications. These Fourth Generation Language (4GL) GUI tools such 
as Borland Corporation’s Delphi are intuitive to use and are re-defining the word “rapid,” but 
since each one can lock a developer into a specific set of proprietary languages and protocols, 
the decision of which of them to use is a paramount issue that can significantly impact 
Web/DBMS link viability. 

4. Serviceability 

Along with content evolution, the Web-associated protocol standards are also 
continually evolving. Because of this, the maintenance of Web server and client software 
must be considered up-fi'ont and early to ensure that addressing and incorporating these 
evolutionary changes doesn’t become a money and time sink. Basically, the easier the 
Web/DBMS link is to service, the easier it is to ensure costs remain low. Site owners must 
therefore have the ability to reconfigure or upgrade the Web/DBMS link easily, and to make 

content changes quickly without high levels of training or hiring the services of outside 
consultants. 

5. Flexibility and Openness 

Once cost, schedule and performance requirements have been defined, the 
Web/DBMS link builder should look to the creation of an “open” link using flexible 
Web/DBMS products that support cross-commercial and cross hardware/operating system 
architectures. The benefits of this flexibility, or openness, is multi-vendor support, platform 
independence, and the ability to rely upon standard browsers that avoid proprietary (and 
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therefore expensive) client software. Openness is indicated by the number of different 
database models and specific DBMS's supported, along with the ability to “multi-home” data 
and spread access across multiple DBMS’s using different application logic protocols. On 
the opposite of openness are single-vendor turnkey products requiring total commitment to 
one link vendor, one hardware instruction set, and thus one set of proprietary protocols. 

6 . Security 

Contrasting with the need for openness and requirements for users to query a DBMS 
and submit their own content is the need to protect Web site assets and provide this world¬ 
wide access to only those users with proper permission. Traditionally, databases have relied 
largely on the operating system for security [Bina94], but linking to the Web changes that 
and requires either the DBMS or another system to substitute in this role and deal 
appropriately with requests from unknown or unauthorized individuals. 

No Web-related issue has received more press than security, and industry concerns 
include user authentication and authorization, multi-level access based upon privileges, 
prevention of identity spoofing or forced entry, navigation control while users are within the 
Web site, and of course, database security and the protection of privacy or personal data. 
That this protection of data within the DBMS must be guaranteed for a Web/DBMS link 
approach to be viable is certain. What is equally certain is that this protection cannot come 
at the expense of the product’s ability to support client requests. 
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7. 


State and Session 


State is the other troublesome issue associated with Web/DBMS linkage and we 
cannot avoid addressing it since the WWW was built with the stateless HTTP session-layer 
protocol placed over the connection-less TCP/IP transportation-layer protocol. As previously 
discussed, state addresses the question “where in a series of related computations (e.g., 
“session”) are we?” In traditional client/server the session begins at logon and ends at logoff, 
and almost all advances in the development of the relational database model presupposed an 
ability to progress through a transaction (session) in a serial fashion, with each step being 
cognizant of what had previously transpired. HTTP’s inability to support retention of place 
or state within a transaction has led site builders to such “work-around’s” as limiting users 
to one-shot DBMS queries, creations such as Netscape Corporation’s “cookies” to store a 
record of state on the client end, or the pushing of user identity and session state through 
hidden intra-web site hyperlinks. Regardless of how the site builders address the state issue, 
they must address it. 

C. DBMS ISSUES 

1. Interface Usability and Control 

In contrast to previously discussed issues, certain Web/DBMS link issues such as 
usability, transactions, concurrency and data type extensibility are specific and internal to the 
DBMS. The DBMS portion of the link should be the least problematic since the relational 
database has been in use since the 1960's, but problems can still arise if the underlying issues 
are not properly considered prior to development. Foremost among these issues is interface 
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“usability” and the level of control users have over their own sessions. The interface should 
support intuitive Web site navigation and personal tailoring of robust database queries, 
rapidly giving end users their data in desired formats. 

This issue of usability as it relates to the intricacy of database queries to be supported 
(i.e., the decision of whether the client will conduct simple searches or whether initial query 
results will be utilized in forming subsequent queries in a cascading fashion) affects Web 
sites from the outset. To be independent of the server, complex query support requires the 
use of a fat client, and yet the two "standard" browsers (Microsoft Internet Explorer and 
Netscape Navigator) that currently dominate the Web market [Ayre96] are thin clients. A 
decision must be made with respect to the level of DBMS interactivity end users will enjoy 
and how that interactivity will be supported. 

2. Transactions 

Properly supporting complex interactivity requires the builder to know and 
understand the limitations associated with specific transactions. A database transaction can 
be one of two types: conversational and pseudo-conversational [Finkelstein96]. In 
conversational transactions a dedicated server process is created for each browser user and 
inaintained until the end-user terminates the connection or a time-out occurs. In other words, 
the unit of work to be accomplished is extended to include the entire duration spent on the 
transaction. The advantage of this approach is that the server maintains information about 
the user’s previous data requests and proceeds without interference from other transactions 
since the database is reserved for the exclusive use of the transaction for its entire duration. 
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The disadvantages are the serial nature of transactions and the resulting requirement for 
server dedication of limited resources to each browser client that has recently submitted a 
request regardless of whether any further processing is needed. This can quickly overload 
the server and degrade performance, especially if numerous transactions are initiated and 
then abandoned, leaving the server to wait for them to time out. 

Pseudo-conversational transactions partition the work into subunits. This division 
of work allows the database to be reserved and released more often, but with the penalty of 
the added effort required to ensure inter-woven transactions do not adversely affect each 
other’s accuracy and data integrity. 

With either type of transaction the continual opening and closing of the database has 
an effect on server performance. 

3. Concurrency 

Concurrency [Burleson94] addresses the problem of users affecting each other’s 
transactions by the reading or writing of data during the other’s transaction. Problems that 
can arise include dirty reads in which a retrieved record is held by another transaction with 
intent to update, non-reproducible results when queries are run against a database in the 
middle of an update, or bad pointers arising during active updates when previously indexed 
records no longer exist due to inserts or deletes. In all cases, the number of simultaneous 
DBMS connections that will be allowed affects the granularity and complexity of DBMS 
concurrency. The Web/DBMS link should support the same level of transaction concurrency 
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required in other distributed database applications and do it with a well-conceived locking 
procedure. 

Locking protects data integrity and ensures that a database transaction properly 
retrieves and updates information by limiting the number of transactions having database 
access at the same time. Locking can take place at different levels, e.g. rows, tables, or even 
the entire database, and can be either exclusive to one process or shared by two or more 
processes depending upon the nature of database access required. 

4. Extensibility 

The final DBMS-specific issue, extensibility, addresses the inherent capability of the 
DBMS to easily Md quickly adapt to new data forms. As the Web continues to evolve, new 
data types such as graphic files or audio clips will become more prevalent and, if keeping 
pace with other sites is an issue, the developers’ chosen DBMS must extend to include the 
ability to store and retrieve these data types. As mentioned before, restriction in data types 
can lead to restrictions in site growth and adaptability. 

D. RELATING THE ISSUES 

1. Tradeoffs 

The challenge to all site builders is to design Web/DBMS links producing the 
relevant information (no more or less), at the right time (no delay), in the appropriate form, 
and at an acceptable cost [Jeffery95]. In a perfect world there would be no downside in 
financial, performance, or lost opportunity terms to having every link address every link issue 
adequately. 
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In truth, though, some of the issues conflict with each other, requiring a site builder 
to consciously decide to give one or the other more emphasis. Some of the more common 
tradeoffs include: 

• cost versus development time/ease: the absolute cheapest way to create a 
dynamic web site is to use freeware and shareware in “hand programming” 
a tailor-made link. The problem is that the money saved on software could 
ultimately be spent on labor since programming the complete solution can be 
time intensive, require a lot of programming experience, and the testing and 
de-bugging can also be difficult. Integrated 4GL development packages 
avoid these problems but in general are quite expensive to purchase initially. 

• openness versus performance: the best performance comes from code that is 
vertically integrated from the beginning to offer a complete solution with no 
extra functionality or abstraction layers. As the capability to support more 
protocols and functions are added, more abstraction layers are added that can 
adversely impact performance speed. 

• cost versus serviceability: closely related to the first bullet, the more hand 
programming involved in the initial creation of a link, the more hand 
programming involved in the servicing and upkeep of the site. Time is 
money. 

• security versus performance: just as more abstraction layers can negatively 
impact performance, the addition of more gate-guard requirements can also 
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be of significance. This is especially true when the additional layers are 
themselves computation intensive or require additional use of network links. 

• transactions control versus performance: time and effort spent on controlling 
simultaneous DBMS access to prevent mutual interference between clients 
can delay the production and transmission of results. 

• data-type versus performance: as previously mentioned, the richer the data 
type, the more network bandwidth and server computational power required 
to maintain rapid results. 

2. Assessing the Combination Categories 

Now that the more common trade-offs have been highlighted, a close look can now 
be taken at how these common tradeoffs apply to each of the aforementioned general link 
categories. This will allow conclusions to be drawn on the relative advantages, 
disadvantages, and tradeoffs associated with each one. The first three categories discussed 
assume the use of a standard browser client. Then the use of a plug-in to facilitate 
independent process is discussed and finally, a brief discussion of fat clients and certain 
proprietary middleware solutions concludes the chapter, 
a. CGI Gateway 

CGI’s primary advantages are its ubiquitous support among Web servers and 
programming languages, and the large body of previously-written software modules that can 
be pasted together by experienced programmers to provide tailored solutions to almost any 
link problem. CGI’s biggest downfall, aside from the requirement for strong programming 
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skills, is the performance deficit it displays in relation to code written and compiled to 
support a DBMS in its native language, and the load it places on the Web server to spawn 
a process and open and close the DBMS. Use of CGI is a tradeoff of performance for cost. 

b. API (including ODBC) 

API’s are the new wave in Web/DBMS linkage offering dramatic 
improvements in performance. These speedy links come at the expense of openness and the 
lack of universal standards and Web server support. To utilize a specific link, proprietary 
software products must be purchased, thus locking the site builder into a long-term 
relationship with a specific vendor. The middleware “API’s” like ODBC attempt to address 
this proprietary nature of API’s but they impose the performance burden of an additional 
software abstraction layer. API’s represent a tradeoff between performance and openness, 
and they come at a greater financial expense than CGI. 

c. Web-enabled DBMS’s 

Utilizing the DBMS itself to produce the HTML file for the Web server to 
ship is an innovation that offers promise. Templates and special HTML code or embedded 
SQL can all be used alone or together to relieve the server of the computational burden of 
gateways. Some of these solutions are proprietaiy and they can cause problems for standard 
browsers that don’t recognize and properly handle special tags. The primary benefit of these 
methods is the retention of the standard browser for the client and the centralizing of the 
solution at the server. Financially these solutions can be prohibitive for the small enterprise. 
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d. Independent Processes 

With the addition of a browser plug-in, the size and capability of the client 
software is moving back in the direction of the traditional distributed DBMS fat client. The 
gain in putting some of the application logic back on the client is the spreading of work 
across all participating entities and the associated possibility for a reduction in network 
transmissions. These processes represent the newest link solution and are therefore the least 
understood with respect to their impact on all the issues. Initial indications are that 
additional client capability comes at the expense of performance and the possible 
compromise of security. 

e. Other Solutions 

In this category are fat clients and special middleware like CORBA that 
successfully resolve state and session issues at the expense of openness. Problems primarily 
revolve around the decentralized nature of these solutions requiring special client-based 
components and functionality. These components bring with them all the traditional 
problems of upgrade and serviceability that accompany any non-standard browser based 
solution. 

3. The Ideal Solution 

In the absenee of technological limits, the ideal WWW/database link would offer the 
scalability of traditional client/server systems along with browser independence, robust 
queries on a secure database, speedy response times, 4GL rapid application development 
techniques, persistent state and session management, and multiple, cross-DBMS support that 
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allowed for easy integration with emerging technologies. In reality, due to the tradeoffs that 
must be made among the underlying issues, no single linkage approach offers a perfect 
solution. The remainder of this thesis, by covering the steps in designing a Web/DBMS link 
for a prototypical case study, serves to further illustrate the complexity and nuances 
surrounding each issue, and the tradeoffs between them. 
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V. CASE STUDY 


This case study focuses on the process whereby The Institute of Electrical and 
Electronics Engineers (IEEE) Signal Processing (SP) Society reviews the format and content 
of technical papers submitted for publishing in the Society’s journal, IEEE Transactions on 
Signal Processing. 

The Society currently receives approximately 700 papers a year from authors desiring 
their papers to be published in the Society’s journal [Moura96]. For each of these 700-plus 
papers, the Society requires two or three qualified reviewers to be found, and their services 
enlisted, to review the paper’s format and content. The administrative logistics required to 
find, solicit, and support enough reviewers to complete approximately 2100 reviews per year 
is difficult. 

A. BACKGROUND 

1. Current Review Process 

As outlined in Appendix A and illustrated in Figure 7, authors desiring publishing 
must first suggest an appropriate Editors’ Information Classification Scheme (EDICS) 
number for their paper based upon its subject matter. The EDICS [1EEE96] is a SP sub¬ 
subject numbering system which is used to categorize and map a paper’s subject matter to 
the most appropriate sub-area within the overall SP subject. It is also used for assigning 
areas of responsibility to respective Associate Editors (AE’s). Of note, each EDICS is 
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associated with several AE’s and each AE is responsible for several EDICS, so overlaps are 
possible. 

Once the author has chosen an appropriate EDICS code for his paper, he mails seven 
copies of the manuscript, abstract and references to the Society. Then, based upon the 
suggested EDICS code. Society staff chooses one of the code’s associated AE’s and forwards 
to him six of the seven manuscript copies. 


Author 

_ J 

* Submit paper 

* Negotiate content 
through AE ■ 


Reviewer 

#I 


* Review paper 

* Negotiate content 
through AE 



Society 

V J 



* Receive paper 

* Choose AE, send 6 copies 


AE 

_ J 

* Choose Reviewers, send copies 

* Facilitate Author-Reviewer 
negotiations 

* Decide to Publish/Reject 

* Nofify Author, Society of decision 


Figure 7. Paper Submission and Review Process 


Upon receiving the paper copies, the AE must find the two or three reviewers that 
most appropriately match the paper’s content and who also have the time and inclination to 
participate in the review process. Although the Society will sometimes suggest the names 
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of some potential reviewers, most of the time the AE alone is responsible for identifying and 
negotiating the services of qualified reviewers for each of the papers he is assigned. 

The most common method utilized by AE’s for reviewer identification is to build an 
ad-hoc list of names from personal acquaintances, word-of-mouth references, conference 
attendee lists, published paper references, journal subscription lists, etc. Then, to solicit 
reviewers from the lists, the AE’s conduct a blanket mailing of solicitations. This mailing 
is generally expected to yield a fifty-percent success rate in terms of responders willing to 
review the paper [Fargues96]. When enough of the contacted reviewers have acknowledged 
the solicitation favorably, the AE mails them copies of the manuscript. Of note, some 
reviewers use students in a mentor-apprentice relationship to perform “ghost reviews” for 
the reviewers subsequent check. 

Each of the reviewers reads the manuscript, annotates comments and suggested 
changes, and then mails the papers back to the AE for forwarding to the author (who remains 
“blind” to the identities of his paper’s reviewers). The author responds to the suggested 
changes by either incorporating them outright or debating them (via the AE) with the 
reviewers. This “negotiation stage” may go through several iterations prior to all parties 
reaching complete agreement. Once the author and reviewers have finally agreed upon the 
paper’s content, the AE must decide whether to publish or reject the paper and then notify 
the author and the Society of his decision. 

Currently, about half of the submitted papers are eventually published after 
negotiations lasting 14 months on average but, in some cases, extending up to two years. 
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The remaining papers are rejected for reasons including incompatibilities with format or 
length restrictions, paper-Joumal audience mismatch, work overlap between authors, or even 
disagreement over the importance and relevance of the paper’s material. 

2. Changes In Work 

Several changes to the current process are being implemented. First, to alleviate 
some of the AE’s workload, the Society plans to wait until the AE has provided a list of 
reviewers and then it will mail the manuscript copies directly to the reviewers. This will be 
done instead of forwarding six copies of each manuscript to the AE for his subsequent re¬ 
mailing to reviewers and therefore allows the Society to monitor time-lag associated with 
reviewer assignment. 

The second change [Moura96] contemplated by the Society is to create the position 
of “designate reviewer.” A designate reviewer is an approved reviewer who enters into a 
relationship with an AE agreeing to accept, for prompt review, a limited number of 
manuscripts for review per year. It is hoped that this pre-agreement will greatly streamline 
the reviewer solicitation process, protect the reviewers from being over-utilized, and 
additionally serve to create a database of reviewers with relevant biographical and review- 
related data. 

The third, and relatively minor, planned change is to require the reviewers to pre¬ 
agree to completing their review within six weeks of manuscript receipt in order to reduce 
the length of the negotiation stage and help ensure that manuscripts are published on a timely 
basis. 
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a. 


New Database 


Lastly, as discussed in Appendix A, the Society has hired an outside 

/ 

consultant to create a Microsoft Access95 database application to hold data relevant to the 
manuscripts and personnel associated with their review. The application is currently being 
implemented on a standalone PC (that is not dedicated solely to the database) but it will 
eventually be moved to a dedicated PC server so all of the appropriate Society staff can have 
unrestricted access. The database is designed to do/aid the following; 

• Log in submitted manuscripts as they arrive 

• Automatically generate a receipt letter to send to the author 

• Automatically generate a list of AE’s responsible for the manuscript’s EDICS 
code 

• Track AE assignments and limit AE responsibilities to no more than three 
manuscripts per month 

• Provide AE’s with suggested reviewer names 

• Keep statistics such as author and reviewer periodicity, or in other words, the 
number of papers an author has submitted or the number of times a reviewer 
has been used. 

3. Database System Support 

The new application can be broken into the three main sub-components of hardware, 
software, and the data/data organization contained within the software. 
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a. 


Hardware 


The database is implemented on the following computer hardware: 

• CPU: 133 MHZ Pentium 

• RAM: 16 MB 

• Hard Drive: 1 GB 

• CD-ROM 

• TAPE BAJ 

• MODEM 

• NETWORK INTERFACE CARD (Ethernet) 

The computer is currently connected to an internal, cross-departmental, 
Ethernet-based, Windows peer-to-peer LAN that is connected to the Internet. This 
connection may change slightly in the near future since the journal staff wants to separate 
from the enterprise-level LAN for privacy reasons and to facilitate an impending move to 
another physical location. 

b. Software 

Software support for the reviewer database includes Access95, a Microsoft 
Corporation implementation of the relational database model, and the Microsoft Windows95 
operating system. The current plan is to give only three staff members password-enabled 
access to the database. An additional requirement exists for remote access via modem due 
to one staff member requiring the ability to work from home. 

c. Database 

The relationships captured by the new relational database are illustrated in 
Figure 8. Although only three main entities (PAPERS, PEOPLE, and EDICS) exist, the 
relationships between them are somewhat complex because of the multiple many-to-many 
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relationships between PAPERS and PEOPLE. Since all personnel related to the review 
process are members of the PEOPLE entity, a single paper can have many relationships with 
the PEOPLE entity; one for each author, one for each reviewer, and one for the AE. 
Additionally, each author, reviewer and AE can be associated with more than one paper. 

With respect to PEOPLE and EDICS, each AE can be responsible for multiple 
EDICS codes while each EDICS can be associated with multiple AE’s. The same many-to- 
many relationship should be included between the reviewers and the respective EDICS codes 
but neither the conversations recorded in the Appendices nor careful scrutiny of [IEEE96-2] 
reveal this relationship to exist within the database. The resulting effect of this “non¬ 
relationship” is that no capability currently exists for categorizing and sorting reviewers 
according to EDICS. 
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The three main database tables are named to match the relational entities. The 
first of these called PAPERS, is for the submitted manuscripts; the second, called PEOPLE, 
is for the review-related personnel, reviewer, author, and AE, and the third; called EDICS, 
contains all journal-related information. Tables 3, 4, and 5 list the actual database tables 
[IEEE96-2] and their respective fields. Explanatory notes are included for some of the fields 
as appropriate. 
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Field 


Notes 


Id# Primary Key 

Title 

Contact Author Foreign Key (PEOPLE.ID#), primary negotiation author 

Authors Foreign Key (PEOPLE.ID#), up to 6 additional authors 

Date Rec'd 

EDICS Foreign Key (EDICS.Code#) 

Status AE/reviewer assignment complete, paper type/accept, etc. 

Last Changed date status changed last 

Over-length yes/no, yes = auto reject 

Inc Format yes/no, yes = auto reject (incorrect format) 

AE Foreign Key (PEOPLE.Editor) 

Acc Type, Pub Type correspondence/paper 

#Pages 

SDATE2 date AE assigned 

Rev Date 
Pub Date 
Copyright Date 

yes/no 
yes/no 

#Figures | 

media type (diskette, email, unknown) 
negotiation revisions 
#pages in final version 
yes/no, yes=abstract to be printed 
date 

Tables. PAPERS 

NOTE; A record is created for each manuscript received whether it is kept for review 
or rejected outright by the staff for format reasons. 


DISC 
#Rev 
Pub Page 
ABSTR 
Abstr Rec’d 


Artwork Rec’d 
Bio Rec’d 


65 

















































Field 


Notes 


Id# 

Primaiy Key 

First 


Last 


Title 


Salutation 


Phone 


Address 

actually 5 fields, ADD1...ADD5 

FAX 


Email 


Editor 

yes/no 

Term Start 


Term End 


Reviewer 

yes/no 

Contact Author 

yes/no 


Table 4. PEOPLE 

NOTE: approximately 40 AE’s are activated on a daily basis for each of three 
Transactions (Digital, Audio, Image). 


Field 

Notes 

Code# 

Primary Key 

Transaction 

(The three Transaction journals share the 
same code numbers but the subject matter 
of each code number changes from 
Transaction to Transaction) 

Description 

Subject of the EDICS code 

Editors 

Repeating Field 


Table 5. EDICS 
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4. 


Other Plans 


The database is not the only change the Society has planned for the review process 
(Appendix A). Also being considered is the acceptance of electronically submitted abstracts 
and reference lists that can be forwarded to the AE’s to offer them a head start in identifying 
reviewers. Once reviewers are identified, the AE will simply send their names to the Society 
so it can, in turn, mail each reviewer a manuscript copy. It is hoped that this new procedure 
will reduce the amount of time that manuscripts spend awaiting reviewer assignment or in 
transit between the AE and reviewer, thereby speeding up the entire review process. 

As an aside, in January of 1997 the Society will also begin publishing letters to the 
editor on the Internet with the hope of eventually offering online subscription/access to the 
journal itself. What is not known at this time is how the current income from subscription 
fees will be replaced. Since the subscription fees are the lifeblood of the journal, this is a very 
serious concern. 

B. CASE ANALYSIS 

1. Review Process Problems 

The current reviewer solicitation process and the resultant time drag it imposes upon 
the entire review process is one of the top areas targeted for improvement by the Society 
[Fargues96]. Soliciting reviewers imposes an inordinate administrative burden upon AE’s 
and they are simply spending too many hours in relation to their other professional 
responsibilities in an attempt to find and communicate with enough new and current 
reviewers. Due to the self-generated nature of their reviewer lists, multiple AE’s are 
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soliciting and utilizing the same reviewers repeatedly, thereby causing the reviewer- 
workload mismatch which bums out the most willing of reviewers. 


Despite the AE’s best attempts to properly support the reviewer solicitation process, 
the sheer volume of papers submitted and the ensuing difficulty in finding enough qualified 
reviewers is causing many AE’s to short-circuit the solicitation process. As a result, the 
AE’s simply mail copies of the manuscripts to potential reviewers who they hope will accept 
the responsibility, and the first time many reviewers know of a paper’s existence is when it 
comes in the mail [Fargues96]. 

In summation, the following list details the problems with the current reviewer 
solicitation process that have been recognized by the Society [Kowalczyk96] as being among 
the first that it wants to address: 

• The continual shrinking of the qualified reviewer pool due to reviewer 
overload, 

• The virtual impossibility of gaining access to individuals outside of the small 
pool of previously-published individuals and AE personal acquaintances, 

• The fact that AE’s usually do not have much administrative support and 
therefore tend to sit on the manuscripts, 

• The fact that many knowledgeable and potentially willing reviewers such as 
Ph.D. candidates, newly hired faculty, and commercial industry 
representatives remain unidentified by AE’s and are therefore not asked to 
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participate due to lack of publishing experience or association with separate 
social/professional circles [Fargues96], 

• The fact that some authors publish regularly but are either unwilling to 
review the work of others or are not identified as potential reviewers (these 
members need to be identified and encouraged to help out), 

• The fear that a general call to the public to identify additional reviewers may 
generate too many responses from unqualified people or those whose 
qualifications would be difficult to vet, thereby overwhelming the small 
journal staff. 

2. Solving the Problems 

As with all problems, there are multiple solutions to those pertaining to the review 
process. Though the merits and cost/benefit ratios of each solution are debatable, what can’t 
be debated is that wasted time equals wasted resources, resources which could be used by the 
Society in other venues. In recognition of this, the Society has made the following 
modifications and additions to the current reviewer solicitation process. 

First, a full effort is being made to expand the database of designate reviewers to 
include every reviewer known to the AE’s and other Society members. The thought is that 
centralizing separately compiled reviewer lists allows the Society to gain control and 
standardize the reviewer qualification process, facilitate AE solicitations for reviewer names, 
and provide a central point of contact for record update by the authors, reviewers and AE’s. 
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This will additionally help the Society track reviewer work assignments to prevent reviewer 
overload and burn out and ensure author participation on both sides of the review process. 

a. Attracting More Reviewers 

The Society is also interested in investigating methods for identifying 
qualified candidate reviewers who are unknown to the Society. By registering these 
candidates and (once properly vetted) adding them to the pool of reviewers, a win-win 
situation is created in which the Society can increase the size of its reviewer pool while 
simultaneously allowing SP subject-matter experts who are unknown to the Society to break 
out from obscurity. It is by identifying themselves and becoming valued, knowledgeable 
partners in the publishing process that both the Society and the new reviewers receive value. 

The anticipated procedure for vetting these candidate reviewers would be for 
them to submit biographical information to the Society, have the staff conduct an initial 
review and then forward the applicant’s information to an appropriate AE. The AE would 
then verify the applicant’s qualifications and inform the Society whether the applicant should 
be added to the reviewer database. Specific biographical information to be captured would 
include all of the fields in the PERSON table plus those extra fields shown below in Table 
6 . 
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Field 

Notes 

Education Level Attained 

MS, PHD, etc 

Certifying Educational Institution 


IEEE Member Number/status 

can be targeted for membership if not already a member 

Biography 

suggestions include papers/articles published, special expertise. 


etc 

References 

with Contact Information 

EDICS Categories Wanting to Review 



Table 6. Candidate Reviewer Biographical Data 


The most important information that the Society needs regarding the potential 
reviewers are references and proof of educational certifications to ensure the motivation, 
authenticity, and credibility of the individuals. 

3. Using The Web 

One possibility for implementing the aforementioned changes to the reviewer 
solicitation process would be to link the new reviewer database to the Internet. This would 
allow potential reviewers to submit their own biographical data while offering the Society 
the potential of dramatically increasing the size of the reviewer pool by tapping into the 
continued growth of the Internet. At a minimum, this link would relieve Society staff of the 
mundane biographical data-entry tasks associated with new reviewers. Of more importance, 
it could also reduce the workload associated with soliciting new reviewers via such 
traditional search strategies as blanket application mailings, exhorting current AE’s and 
reviewers to “rush” new reviewers, and conducting telephone polls. 
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A Web link has yet to be created for the new database due to concerns 
[Kowalczyk96] over maintaining data integrity (sic. purity) and maintaining the sanctity of 
the review process. Protecting the confidentiality of reviewer identities from the authors 
dictates that only the office staff, actually responsible for maintaining the database and 
ensuring that data entries are correct, should have read/write access to reviewer, author and 
AE data. 

AE access to the new database via the Internet is also considered problematic since 
AE’s are only appointed on a voluntary basis for a short term lasting just long enough to 
oversee the review of 25 manuscripts. In addition to a valid fear of malevolent hacking, it 
is this revolving door of appointed AE’s (of whom a high percentage are authors) that has 
caused the Society to demand total separation of the new database and Internet to date. The 
concern is that just one mishap is one too many given the limited number of personnel 
available to “fix” bad or incorrect entries, regenerate lost information, and handle the 
political fallout associated with the revealing of reviewer identities. 

Despite these concerns, the Society’s fears are, in fact, not unique and are actually a 
subset of the Web/DBMS link issues discussed in Chapter m. Using the Web as a 
communication link between potential reviewers and the database can actually be a viable 
tool that the Society staff can use with confidence. This link will specifically alleviate 
prospective reviewer identification problems by allowing potential reviewers to submit their 
own personal data to the database for Society review. Also, by additionally enabling the 
AE’s to rapidly find the appropriate reviewers for a particular paper, the link aids the Society 
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in reducing the administrative burden associated with centralizing the manuscript review 


process. 

In summary, the creation of a Web/database link is a viable solution to the Society’s 
problems with the current reviewer solicitation process. The remainder of this thesis 
discusses the issues, functionality and additional system support required for implementing 
this link within the guidelines of the mandatory access eonstraints. It also details required 
changes to the currently implemented database which must be made to facilitate the use of 
a Web/database link. 
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VI. DESIGNING THE LINK 


When currently used systems no longer meet user needs, changes must be 
implemented. This chapter assumes the Society’s concurrence with the technical viability 
of Chapter V’s recommendation to change its current reviewer solicitation and designation 
process by modifying its manuscript reviewer database and linking it to the Web. The 
chapter provides a framework for implementing the recommended changes and begins by 
discussing the expected benefits and required constraints upon the Web/database link. 
Following this discussion is a listing of the advantages and disadvantages of using each of 
Chapter Hi’s general link approaches. Finally, the chapter concludes by first selecting the 
general link approach which best addresses the Society imposed constraints, and then 
specifying and discussing the system design to implement the link. 

A. USER REQUIREMENTS 

1. Expected Benefits 

To ensure that the Society’s needs and concerns are properly addressed, a thorough 
investigation of the expected benefits and user-imposed constraints is required. Primary 
among the benefits expected from linking the database to the Web is the Society gaining 
access to a larger pool of potential reviewers. This larger pool will reduce workload 
requirements for the individual reviewer while simultaneously decreasing the amount of time 
AE’s spend finding and enlisting the services of new reviewers [Fargues96]. Additionally, 
by combining the separate AE reviewer-contact lists the Society will gain: 
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• centralized administrative oversight and control of reviewer qualification 
process 

• reduction of AE administrative load 

• removal of the redundancy between AE lists 

• facilitation of the AE turnover process. 

In other words, centralization of the reviewer database will remove the AE reviewer- 
assignment bottleneck from the review process and, via the quick identification of specific 
reviewers in response to paper content, ease the building of reviewer teams. This ability 
could eventually facilitate the creation of reviewer mentor/apprentice relationships when 
appropriate. The end result will be a reduction in the 14-month lag from manuscript 
submission to publication. 

Another benefit of creating a database link and hosting a reviewer-specific web site 
is the fact that it is not a radical departure from currently existing TFFF websites including 
the main IEEE page (www.ieee.org) and the SP homepage (www.ieee.org/sp). The new site 
could take advantage of existing IEEE-related Internet traffic as it is a fairly simple matter 
to make the new site an extension linked to the other sites. 

2. Specific In-variants 

Due to the small size of the journal’s staff, the limited nature of its supporting budget, 
and the Society’s strong concern for its intellectual and computational assets, certain in¬ 
variants were stated [Kowalczyk96] from the onset for the Web/database link. 
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First among equals is the absolute imperative to preserve and protect database 
integrity from inadvertent mistakes or malicious intent. Of almost equal importance are 
protecting the sanctity and confidentiality of the review process and ensuring that the 
Society’s intellectual assets (sic. reviewer contact lists) are protected from competitors. This, 
in turn, requires that limits be placed upon the AE’s ability to search and access reviewer 
names and contact information. The AE’s access must be limited to just the particular 
author-reviewer grouping for which the AE is accountable. 

Also understood is that the Society desires complete control over the solution design 
and content to ensure that any new system implemented is completely compatible with the 
Society’s currently used hardware and software systems. Furthermore, flexibility and 
scalability for future content and size growth must be guaranteed and the link project must 
be inexpensive to maintain and deploy or it will not be implemented. 

Finally, when new reviewers register with the Society, registration must be 
accomplished in such a manner to facilitate quick and accurate Society verification of 
submitted qualifications. Care must also be taken to ensure the new reviewers are familiar 
with proper review procedures to protect all parties’ legal and privacy interests. 

3. Information Flow 

a. Web Input/Output 

As illustrated in Figure 9, the Web site will initially funnel two separate data 
streams into the database. The first stream is the prospective reviewer’s biographical data 
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as submitted via an HTML form, while the second contains AE queries submitted to the 
database as they search for reviewers associated with a certain EDICS code. 



Output of the Web site will also consist of two things. The first is a simple 
HTML page confirming to the prospective reviewer that their data has been accepted for 
investigation and that they will be contacted within a certain time frame regarding their 

status. The second output will be for the AE’s only and consists of the results of their 
submitted query. 

4. Sunwnary 

The overall requirement for the Web site is that it act as a clearing house for 
collecting and disseminating information related to the reviewer solicitation and paper review 
process with the end goal of making this process easier for all parties involved. 
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B. CHOOSING AN APPROACH 


1. Issues for Consideration 

When looking at the case analysis section in the previous chapter and the in-variants 
discussed in this chapter, one thing becomes clear: the Society’s concerns regarding 
connection of its database to the Web are actually real-life manifestations of the overarching 
link issues of Performance and Functionality, Cost, Development Time and Ease, 
Serviceability, Flexibility and Openness, Security, State and Session discussed in Chapter 
IV. This section reviews these issues as they relate to the case study. 

a. Overarching Issues 

It is understood that the most important issue of concern when choosing a link 
approach for this case study is the guaranteed security of the Web site and the database. No 
matter how well the other issues are addressed, it is a given that if security is not guaranteed, 
no Web link will be created. Therefore, the approach recommended by this thesis holds 
security in the forefront of all design decisions. 

With Society funds and available personnel a scarce commodity, the cost of 
the chosen link approach must include all additional system support purchases plus the 
consultant fees for implementation. Therefore, the development of the link must be rapid 
and easy, and the actual application must easily support continual re-design as the she 
changes and grows over time. Finally, the need to limit expenditures and ensure 
compatibility with currently used systems also requires the link to utilize and leverage as 
much of the installed system components and user expertise as possible. 
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The easiest issues to grapple with are the performance and functionality 
issues. Performance wise, with the limited number of AE’s who would have access to the 
database at any given time, a high performance server on the order of a catalog or auctioning 
service is not needed and a simple system will suffice. Additionally, completing and 
submitting a form to a Web site or executing a simple database query are two common 
functions which are well within the capabilities of most Web software programs. 

From a database perspective, the issues of transaction monitoring and 
concurrency limitations are relatively moot since they are “write” related issues, and as 
originally envisioned, all case study database writing will be done by the prospective 
reviewers. Since each resume constitutes a separate record, the prospective reviewers will 
not be accessing the same records and the requirement for concurrent processing is non¬ 
existent. With respect to the AE’s, although they will sometimes be accessing the same 
records simultaneously, concurrency is still not an issue since the records are related to 
currently vetted reviewers and all AE queries will be “read” only. 

2. Approach Alternatives 

To properly address these issues, the Society can choose a Web/database link 
approach from the general approach alternatives discussed in Chapter HI: 

• Common Gateway Interfaces (CGI’s) 

• Application Program Interfaces (API’s) including Open Database 
Connectivity (ODBC) 

• Web-enabled Database Administrative Systems (DBMS’s) 
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• Independent processes 

• Other solutions. 

A brief scan of the list reveals the following: use of a CGI gateway can be quickly 
discarded (despite its offered flexibility) because of its potentially deleterious effect upon 
security and the level of programming skills required to use it. With respect to independent 
processes, the writing of Java applets or Active-X controls can also be discarded, as writing 
them can be too hard for the non-programmer and the computer industry has yet to 
completely address all concerns regarding security. The final category to quickly be 
discarded is the catch-all category of “Other Solutions.” This category includes the 
proprietary approaches which suffer from hmited openness and security qualities, or are too 
expensive to deploy and maintain on a limited budget. 

The options actually constituting viable alternatives are the use of API’s (including 
ODBC) and Web-enabled DBMS’s. Either one offers a valid approach to addressing the 
Society’s link requirements, but as will be shown later in this chapter, API’s offer more 
flexibility. 

One option the Society could take is to convert its new Access95 database into a 
Web-enabled database [Simpson96] which actually generates and exports static HTML files 
directly to a Web server and on to the Web. This is accomplished via a free add-on called 
Internet Assistant for Access95, which is downloadable from Microsoft’s Internet site. One 
disadvantage to this method is the labor intensive nature of continually creating new static 
files every time that the database is updated. Despite the additional labor, if the Society only 
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wants to export static information to the Web without offering the capability of updating 
database tables or enumerating dynamic data, then the Internet Assistant for Access95 could 
be one avenue worth pursuing. Since the implementation is trivial and the offered capability 
is not within the purview of this thesis, the actual use of Internet Assistant for Access95 will 
not be explored further. 

3. Recommended Approach 

We recommend a solution falling within the category of “API (including ODBC).” 
The primary reason (outside of the potential for air-tight security) is that it offers complete 
compatibility with the Society’s installed software/hardware/user-expertise base and 
therefore eases the migration of the Society’s new database to the Web. This compatibility 
also serves to limit the amount of additional cash outlays and the training time required to 
bring the link to fruition. 

C. SYSTEM SPECIFICATION 

This section specifies required components and the system design for the 
recommended approach. Sufficient detail is provided to serve as a framework for Society 
implementation. 

1. System Support 

a. Software Applications 

Given that the Society is currently implementing its reviewer and paper 
tracking database on a IBM-clone PC with all Microsoft products, the link should be 
designed with the Microsoft software applications listed in Table 7. 
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Application 

URL of Location Extra Reference Material 

Access95 

www.microsoft.com/msaccess 

Windows NT Server (3.51 or 
later) 

www.microsoft.com/ntserver/default.asp 

Internet Information Server 

www.microsoft.com/iis/default.asp 

Internet Database Connector, or 
dbWeb (choose one) 

www.microsoft.com/accessdev/itk/idcfaq.htm 

www.microsoft.com/intdev/dbweb 


Table 7. Microsoft Internet Related Software Applications 

Because Access95 is already being used by the Society for its new database, 


it is the obvious choice for the database portion of the link. The added benefits of using 
Access95 are the user level password security it offers to protect data, and the fact that it 
already contains an ODBC driver as shipped. 

We recommend switching from the Windows95 operating system to Windows 
NT Server (3.51 or later). Windows95 is not sufficient for Web site hosting because it is a 
workstation operating system which does have any networking capability outside the 
Microsoft peer-to-peer networking service first introduced with Windows for Workgroups 
3.11. 


Windows NT Server is a complete multi-threaded, multi-tasking operating 
system offering centralized user access control, file-level security locks, and robust support 
for remote modem access and Internet connectivity [Minasi96]. NT Server scales easily to 
support large numbers of users while offering single-point administration of all users and 
resources. The Society’s security concerns are particularly well addressed because of NT 
Server’s C-2 security capability which precludes users from granting access rights. NT 
Server also has the ability to monitor all security related actions and keep a log for 
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subsequent review. Finally, NT Server offers the ability to selectively grant access to users 
(and groups of users with the same needs) for all methods of access including interactive 
(user is physically sitting at the computer), over a network, and via a modem. Access type 
is also controllable so while Society staff can be given full read, write, and create access, 
AE’s can be given password-controlled read-only access while prospective reviewers can be 
given write access only. 

For the Society staff who require at-home access to the new database, NT’s 
Remote Access Server (RAS) offers the user all of the capabilities that they would enjoy 
while actually sitting at the computer. The data security capability of RAS is quite strong 
and includes the capability to encrypt all transmitted data. Access security via encrypted 
passwords and telephone callback features is also quite strong. 

Internet Information Server (IIS) is the only Web server that integrates fully 
with Windows NT Server. HS runs all of the standard Internet services (Web, FTP, Gopher) 
and allows the site manager to grant/restrict, monitor and log user access to HTML pages, 
data files, and database records. By seamlessly integrating with the security features of NT 
Sever, HS provides the ability to require users to input identification and password 
information prior to gaining access to “privileged” information. It can even support the use 
of SSL if required. Finally, HS works with all standard client browsers hosted on almost any 
platform, thereby offering the universal access necessary for the Society’s world-wide users. 
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With respect to linking the Web server to the database, Microsoft’s Internet 
Database Connector (IDC) and dbWeb, as illustrated in Table 8, are both API applications 
which offer complete Web server/database connectivity and “wizard” publishing technology. 
The wizards work via a point and click metaphor to abstract much of the mundane 
programming required to build a link. Where the two applications differ, though, is that they 
are targeted towards two essentially separate types of users. 


TRAIT 

IDC 

dbWeb 

Wizard interface shields users from complexities of 
Web/database interface programming 

yes 

yes 

Primary Audience 

developer 

developer/power user 

Uses IS API architecture to run as a server process 

yes 

yes 

Can publish data from any ODBC compliant database 

yes 

yes 

Creates static pages 

yes 

no 

Creates dynamic pages in response to user input 

yes 

yes 

Creates files with SQL statements the developer can modify 

yes 

no 


Table 8. IDC and dbWeb 


IDC is a developer-oriented core technology of the Internet Information 
Server that provides connectivity between the Server and any ODBC-compliant database. 
It also serves as a foundation for developing custom database applications in concert with 
Internet Information Server. IDC provides a great deal of flexibility by allowing developers 
to directly program custom SQL statements, create their own HTML templates, or generate 
static database information “snapshots” which can be quickly loaded by the server in 
response to client requests. 
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dbWeb, while also offering connectivity between the Server and any ODBC- 
compliant database, uses its wizards to additionally abstract the intricacies of SQL 
progranuning. By offering this additional abstraction service, it prevents the user from 
having to actually do any programming in SQL. Also, dbWeb does not allow the generation 
of static HTML files and is therefore a little less robust than DC. 

b. Platform Selection 

Compiling the list of hardware required to use the above applications is very 
easy since most of the required hardware components are the ones already owned by the 
Society and listed in Chapter V. The only hardware item needed is some additional RAM 
in order to bring the system total up to the minimum recommended by Microsoft. 

c. Cost of Upgrading 

Upgrading the current system is quite inexpensive since all the current 
hardware plus Access95 can be kept. The major software expense is replacing Windows95 
with Windows NT Server (3.51 or later) since IIS and IDC are free add-ons to NT Server. 
The hardware upgrade is an inexpensive matter of inserting two more SIMMS (assuming 
current configuration is two SMB SIMMS). Table 9 lists the additional system support items 
required and includes an estimation of each item’s cost based upon market rates advertised 
in the San Jose Mercury during the month of December 1996. 
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Item 

Cost 

Microsoft NT 3.51 (5-client license) 

$750.00 

Internet Information Server 

free 

Internet Database Connector 

free 

16 MB RAM 

$75.00 


Table 9. Upgrade Costs 


Note: the number of recommended client licenses, although low, is sufficient because 
they actually enumerate the number of simultaneous connections the server can support. 

d. Migrating to the Web 

The switching of the database from one operating system to another is trivial 
since NT Server supports all of the Windows95 applications the Society is currently using. 
Still, migrating the current database to the Web requires a choice between two options. The 
first option is to put aside vulnerability concerns about granting Web access to the database 
and to simply use the single database application for staff use, prospective reviewer resume 
submission, and AE search queries. This requires extra care when implementing the link so 
that integrity control is never lost and no data will need to be re-generated. 

The other (and better) option is to actually have two separate database 
applications, one for staff use only, and the other a replicate for all Web-related use. 
Complete physical separation of the two databases offers greater security than trying to 
protect sub-portions of the database from net users. This separation and replication is 
possible because Access95 can actually import table data from external sources. Examples 
of external data sources include tables from other Access95 databases located on a network. 
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data from other programs such as Microsoft Excel, Microsoft FoxPro, Paradox, and 
Microsoft SQL Server, as well as data from other programs and files. In the Society’s case, 
importing data creates a copy of the information in one Access95 database to a new table in 
another Access95 database so the external data source's format is not altered. The Society 
staff can therefore continue to use its original program, while Web users use the other 
Access95 database to add or edit the copy of their own personal d a ta . 

Although imported data can't be appended to existing tables (except when 
importing spreadsheet or text files), once a table has been imported, an append query can be 
performed to add the data to a pre-existing table. When importing from another Access95 
database, the user can choose to import all or just a subset of the database in a single 
operation. Additionally, one of the best capabilities of Access95's import feature is that if 
the need exists to perform an import operation on an ongoing basis, the user can automate 
the process of importing data using macros or Visual Basic for Applications (VBA) code. 

In the future, if the Society decides to use another database or scale up to a 
different server suite, Access95 can also export data and the information needed to access 
that data to programs and databases which support the Open Database Connectivity (ODBC) 
protocol. Options include, but are not limited to, Microsoft SQL Server which is the logical 
extension to the use of Access95 in large scale operations. Of note, the information required 
to define and access an ODBC data source varies depending on the requirements of its 
ODBC driver. The developer must therefore create a data source definition for each ODBC 
database he wants to import from, export to, or link to the Access95 database. 
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2. System Design 


The required software will work together as illustrated in Figure 10. The IIS will 
listen on the Internet for incoming client requests and pass them off to NT Server. The 
Server will ascertain which URL is requested and forward resume submission requests to the 
secondary Access95 database while the AE-generated reviewer-name queries will be 
forwarded to the primary database. 
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Figure 10. Software Hierarchy 


a. Sample Queries 

The SQL queries that the database needs to support are very simple. After 
entering proper user authentication data based upon name and password, the AE will be 
presented with a form which allows him to choose (via a list box) one of the EDICS codes. 
The database will then perform an SQL query similar to the one below: 
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SELECT First, Last 

FROM PEOPLE 

WHERE Reviewer = “yes” 

And 

EDICS_Reviews = “EDICS#” 

The database responds with a simple list of reviewer names, each of which is a hyperlink to 

specific reviewer contact data (EDICS# is the code number selected by the user). Of more 

importance, the current database must be changed to incorporate a new field in the PEOPLE 

table that links the reviewers to the EDICS they review. 

Once the names have been returned as hyper-links, the AE can then scroll 

down the list, click on a few reviewer hyper-links and copy or save the contact data. The 

actual SQL code for executing this would be similar to the following pseudo-code in which 

“First” and “Last” represent the hyper-link data. 

SELECT * 

FROM PEOPLE 

WHERE First = “First” 

And 

Last = “Last” 

3. Potential New Features 

Once the extra system support has been installed and the link between the database 
and the Web is operating correctly, the Society may choose to expand the site’s capabilities. 
Example additions include enabling NT’s FTP service so members can access and download 
large review instruction files or authors can download paper format templates. The potential 
would then exist for authors to submit their original papers and reviewers to download, 
review, and resubmit the annotated papers via FTP instead of by postal mail. The initial use 
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of the FTP service is less important than the fact that NT’s FTP capability exists and offers 
exciting new opportunities for the Society to expand upon the functionality that is already 
offered via other sites linked to the main IEEE homepage. 

The incorporation of an AE email address library categorized by EDICS codes that 
can facilitate communications between review personnel (Society Staff, reviewers, AE’s) is 
another potential use of the Web site. Implementation would be a simple matter of extending 
the web site by adding another form page which would incorporate the appropriate SQL 
query. 

As the Internet continues to grow, the Society may even want to consider creating a 
subset of the online “virtual” magazine discussed in Chapter V’s “Other Plans” section. 
Online publishing of letters to the editor can be fully supported with the system 
recommended by this thesis. More importantly, this will serve a test bed for the creation of 
a complete online magazine in which all paper submission, review and publishing occurs on 
the Web. The bottom line is that the Society must resolve to grapple with Web/DBMS link 
issues and begin its transition to providing content on the Web if it is to remain relevant in 
the electronic age. 

4. Impact of Planned Software Upgrades 

Finally, according to information located on its Web site, Microsoft is planning to 
release major upgrades to its software in the first quarter of CY97. We have investigated the 
press releases surrounding these upgrades and when this occurs, the functions currently 
offered by the IDC, dbWeb and Internet Assistant will be embedded in NT Server and 
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Access97 respectively. This will not change any of this thesis’ recommendations and will, 
in fact, actually serve to further ease the Society’s transition to the Web. 
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Vn. SUMMARY AND CONCLUSION 


A. SUMMARY 

The objectives of this thesis were to 

• create a taxonomy of existing dynamic Web\database linking methods 

• examine the issues surrounding this linkage of a database to Web 

• investigate a case study linkage problem 

En-route to meeting these objectives, a thorough review of currently available and 
utilized methods, protocols and standards for connecting a relational database to the Web 
was conducted. As work progressed, the information collected through the course of this 
review revealed that the myriad of link products currently available actually fall within a 
small number of general categories, each of which has it own strengths and weaknesses with 
respect to the issues underlying the connection of a database to the Web. 

The use of a case study and an analysis of its associated problems offered an 
opportunity to investigate the potential use of each general link category in a prototypical 
situation. The ensuing discussion of an actual design for implementing the most appropriate 
of the potential approaches for the case study revealed the validity of the taxonomy and 
universal nature of the issues. The suggested design delivered all of the benefits expected 
from the creation of a Web/database link while addressing all the underlying link issues. 
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Finally, the database technology research conducted in this thesis may actually 
become rather quickly dated due to the speed with which Internet-related technologies are 
being developed. This is true because database, operating system, and middleware product 
vendors all approach the link issues from different directions and they are all in competition 
to strengthen the connectivity between the Web and database applications. With the 
marketplace driving the rapid pace, the possibility of new categories for link approaches 
being developed is quite high. 

B. ADDITIONAL RESEARCH AND DEVELOPMENT 

Although this study has amply illustrated the viability of the different Web/database 
link approaches and even provided a framework for implementing one of these link 
approaches, the possibility exists for further research. By exploiting the power of the 
Internet, a change can be made from the postal delivery of paper media to video presentation 
of electronic content from points both internal and external to a user’s computer. As a result, 
it may no longer be necessary for people to gather in one geographic location to create and 
maintain a database which is linked to the Web. This thesis provides a framework and 
cornerstone for ongoing work which may achieve the goal of developing a DBMS which is 
completely disconnected from geographical limitations. 
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APPENDIX A. SUMMARY OF CONVERSATION 


On Wednesday, 27 November 1996, the following information was transcribed by 
LCDR Dean during a telephone conversation with: 

Mercy Kowalczyk, 

Executive Director 

IEEE Signal Processing Society 

445 Hoes Lane 

PO Box 1331 

Piscataway, NJ. 08855-1331 
A. THE REVIEW PROCESS 

1. Current Situation 

The Society’s current manuscript review process is as follows: 

• Authors desiring publishing must suggest an appropriate EDICS code for 
their paper based upon the manuscript’s subject matter and mail seven copies 
of their manuscript, abstract and references to the Society. 

• Based upon the suggested EDICS code. Society staff chooses one of the 
code’s associated AE’s and forwards six of the seven manuscript copies. 

• The AE compiles a list of potential reviewers from related publication 
reference lists, word-of-mouth recommendations, and personal acquaintances, 
and then attempts to enlist their services. Of note, some reviewers use 
students to perform “ghost reviews” that reviewers subsequently check. 
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• Once two or three contactees agree to review, the AE mails them copies of 
the manuscript. 

• The reviewers read the manuscript, aimotate suggested changes, and mail the 
papers back to the AE for forwarding to the author who remains “blind” to 
his paper’s reviewer identities. 

• The author responds to the suggested changes by either incorporating them 
outright or debating them (via the AE) with the reviewers. This may take 
several iterations prior to reaehing total agreement. 

• Once the author and reviewers agree on the paper’s eontent, the AE must 
decide whether to publish or reject the paper, notify the author, and inform 
the Society of his decision. 

2. Problems 

Several problems are associated with the current review process: 

• AE’s usually do not have mueh administrative support and therefore tend to 
sit on the manuscripts. 

• Since reviewers are extremely hard to find, the tendency is for multiple AE’s 
to repeatedly contact the same reviewers. 

• Many knowledgeable and potentially willing reviewers remain unidentified 
by AE’s due to lack of publishing or separate social/professional circles. 
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• A general call to the public to identify additional reviewers may generate too 
many responses from unqualified people or those whose qualifications would 
be difficult to vet. 

• Some authors publish regularly but are either unwilling to review the work 
of others or are not identified as potential reviewers. These members need 
to be identified and encouraged to help out. 

3. Changes Underway 

Several changes to the current process are being implemented. First, instead 
of forwarding six copies of each manuscript to the AE for his subsequent re-mailing to 

reviewers, the Society plans to wait until the AE has provided a list of reviewers and then 
mail manuscript copies directly to the reviewers. 

Secondly, the Society has hired an outside consultant to create a Microsoft Access95 
database application to do/aid the following: 

• Log in submitted manuscripts as they arrive 

• Automatically generate a receipt letter to send to the author 

• Automatically generate a list of AE’s responsible for the manuscripts EDICS 
code. 

• Track AE assignments and limit AE responsibilities to no more than three 
manuscripts per month 

• Provide AE’s with suggested reviewer names. 
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Finally, reviewers will be required to pre-agree to complete the review within six 
weeks of manuscript receipt to ensure manuscripts are published on a timely basis. 

4. Future Plans 

The Society is considering the acceptance of electronically submitted abstracts and 
reference lists that can be forwarded to the AE’s and thereby offer them a head start in 
identifying reviewers. Once identified, the AE will send the reviewer names to the Society 
which will mail them manuscript copies. It is hoped that this new procedure will reduce the 
amount of time that manuscripts spend awaiting assignment or in transit and thus speedup 
the entire review process. 

Allowing AE’s Web access to the new database is also being considered and thus the 
Society’s support for this thesis as an initial investigation into the relevant issues. The Web 
link has yet to be started due to concerns over maintaining data integrity and reviewer 
confidentiality from the authors. This is exasperated by the fact that, in some cases, AE’s 
have responsibility for the very EDICS code to which their own papers have been assigned. 
Because of these concerns, the following in-variants for any Web/database link were stated: 

• AE’s need to search for and access reviewer names and contact information 
contained in the new database must be limited to protect database integrity 
and the confidentiality of the review process. 

• Only Society staff and the accountable AE shall know of a particular 
author/reviewer grouping. 

• The Society’s intellectual assets must be protected from competitors. 


98 




New reviewers must be able to register with the Society in a manner that will 
facilitate Society verification of qualifications and familiarity with review 
procedures. 
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APPENDIX B. USER QUESTIONNAIRE 


On December 2,1996, this questionnaire was faxed to: 

Mercy Kowalczyk, 

Executive Director 
IEEE Signal Processing Society 
445 Hoes Lane 
PO Box 1331 

Piscataway, NJ. 08855-1331 

The purpose of the questionnaire is to re-address (in detail) certain points first 
brought to light in Appendix A, specifically those concerning the new database and the 
potential use of a Web/database link facilitating the sohcitation of new manuscript reviewers. 
Her responses on December 5, 1996 are written below each question in italics. 

A. HARDWARE 


What computer hardware is the new database implemented on? 


CPU: 133 MHZ Pentium 

RAM: 16 MB 

CD-ROM: Yes 

Hard Drive: 1 GB 


TAPEBU: Yes 

MODEM: Yes 

NETWORK INTERFACE CARD: 


Yes 


Is the computer currently connected to the Internet? 

Yes, via an internal cross-departmental Ethernet-based Windows For 
Workgroups 3.11 peer-to-peer LAN. This may change since the Journal staff 
desires to separate from the enterprise level LAN for privacy reasons and to 
facilitate an impending move to another physical location. 
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• How much hard drive disk space is available for additional Web/database 
link software? 

Unknown 

B. SOFTWARE 

• Assumed: the reviewer database is implemented with Microsoft Access95. 
Yes 

• Assumed: Windows95 is the operating system. 

Yes 

• Is any other software being used to implement the database? 

No 

• Will the computer be dedicated solely to the database? 

It is not now. The database may be moved to a dedicated PC server so the 
appropriate staff can have unrestricted access. Ultimate plan is to give only 
three staff members password enabled access to the database. An additional 
requirement will be for remote D.B.A. access due to one staff member 
requiring the ability to work from home. 

C. DATABASE 

• What data are held in the database for each reviewer, author, and Associate 

Editor (AE), and how are they individually identified? 

Note: the fields listed below differ from the ones actually documented in 
[IEEE96-2] but are included for transcription accuracy purposes. 
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NAME (Last, First) ==>Primary Key 

AUTHOR (yes/no) 

REVIEWER (yes/no) 

AE (yes/no) 

MAIL ADDRESS 

TELEX 

PHONE 

EMAIL 

FAX 

MISC (this is where the associated Editor’s Information 

Classification Scheme (EDICS) codes are entered) 

How many of each type of person will be entered in the database? 

Approximately 40 AE’s activated on a daily basis for each of three 
Transactions (Digital, Audio, Image). This equals approximately ( ? )for the 
whole database. 

What other TABLES are in the database and what are their fields? 

There are at least two other tables. Neither one appears to be normalized as 
described by Ms. Kowalczyk. 

For the EDICS table (Primary Key unknown) the fields are: 
TRANSACTION (the three Transaction Journals share the same code 
numbers but the subject matter of each code number 
changes from Transaction to Transaction) 

EDICS CODE 
EDICS SUBJECT 
AENAME 

For the MANUSCRIPT table (Primary Key unknown) the fields are: 

AUTHOR 

TITLE 

CONTROL NUMBER ==>Primary Key (5-digit number preceded by A, I, 

D to signify associated Transaction) 

DATE RECEIVED 
INITIAL PAGE COUNT 
AE ASSIGNED 
DATE FORWARDED TO AE 
DATE SENT TO REVIEWERS 
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REVIEWER NAMES 

PUBLISH DECISION (yes/no) 

DECISION REASON 

DATE OF DECISION 

DATE TO BE PUBLISHED 

FINAL PAGE COUNT 

DATE FINAL MANUSCRIPT RECEIVED 


NOTE: a record is created for each manuscript received whether it is kept for 
review or rejected outright by the staff for format reasons. 

Are EDICS/AE relationships captured by the database? 

Yes, by a JOIN(?) between the EDICS and PERSON tables 

Are AE/reviewer relationships captured by the database? 

Yes. by a JOIN(?) between the MANUSCRIPT and PERSON tables 

What specific data for reviewers, authors and AE’s must be accessible 

only by office staff ? 

All 

What specific information must be captured on people signing up to be a 
reviewer? 

All fields in the PERSON table plus: 

EDUCATION 

IEEE MEMBER NUMBER/STATUS 

BIOGRAPHY? (suggestions include papers/articles published, special 
expertise, etc.) 

REFERENCES WITH CONTACT INFORMATION 
EDICS CATEGORIES WANTING TO REVIEW 
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What are the top pieces of information the Society would like to have on 
newly signed up reviewers to ensure credibility, authenticity, and motivation 
of the individuals? 

References, educational certifications, and which EDICS categories they are 
willing to review for. The latter is important because there are trends and 
fads in DSP subject matter such as fuzjy losic. wavelets, and multimedia 
while other areas such as underwater are rather esoteric, all leading to the 
continual changing of EDICS specific workload. 

What information will be made available to the Internet? 

None, total separation of database and Internet is desired. Anticipated use 
is to have potential reviewers submit information and then have the staff 
conduct an initial review, forward the applicant’s information to an 
appropriate AE, and have the AE vet the applicant and inform the Society 
whether the applicant should be added to the reviewer database. 

The problem with giving AE’s access to the database is that they are only 
appointed (on a voluntary basis) for a short term (long enough to oversee the 
review of 25 manuscripts). Since there is a revolving door of AE’s and a 
high percentage of them are authors, access becomes problematic. 
Furthermore, due to the small nature of the stcff and supporting budget, it is 
absolutely imperative that the database not be sullied by inadvertent mistakes 
or malicious intent. The two absolutes are: 

1- to preserve/protect database integrity 

2- to protect sanctity and confidentiality of review process 

Will the information be tailored to specific types of accessors? 

No. This question is irrelevant since the Society does not desire to connect 
its database to the Internet. 

Are there any statistics that the database must keep? 

Author, reviewer periodicity, ergo, how many times reviewer has been used. 
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Can I have a copy of the database tables with fictitious data? 


Possibly forthcoming, time-line unknown 

• ASSUMED: Web connected database information will be READ only and 
no direct WRITE capability will be granted. 

This is the crucial matter that is addressed by this thesis. Although Mercy 
feels personally that the answer should be FALSE, and that no link should be 
allowed between her database and Internet, she is willing to discuss the 
matter as she recognizes that others within the IEEE organization may feel 
differently. 

• ASSUMED: Individually submitted reviewer information will be checked 

and entered by office staff to maintain database integrity. 

FALSE, after initial Society review for completeness, the data will be 
forwarded to an appropriate AEfor vetting. This will potentially require an 
additional field in PERSON table to enter the reviewer’s qualifying AE’s 
name. 

D. FINANCIAL 

• If the decision is made to make certain information available from , the 

Internet, will additional money be available, and if so, how much? 

Unknown, depends upon the worthiness of the project, will not be a large 
sum. 

E. PEOPLE 

• Who will access the database in the office? 

Appropriate staff 

• Who will maintain the database and ensure that the data is correct? 
Appropriate staff 
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Who does the Society want to access the database from the Internet ? 


Nobody (see discussion in next question) 

• Who does the Society not want to access the database from the Internet ? 

As stated in section C. above, this a contentious point. Mercy feels that 
everybody should be excluded. What is not clear at this time is whether 
there truly are a lot of people out there who are ready, willing and able to 
review and yet they are unknown to the Society. As an aside, a lot of people 
are uncomfortable putting personal information on the Internet and this may 
impact the willingness of potential reviewers to submit a resume over the 
Internet. 

F. TIMING 

• Does a deadline exist for Internet connectivity? 

No 

G. MISCELLANEOUS 

• How is an EDICS code assigned to a newly submitted paper? 

Suggested by the author 

• Does the Society foresee extensions to the currently planned use of the 
database as a central reservoir of reviewer information. In other words, what 
else is planned? 

In January of1997 the Society will begin publishing the letters to the Journal 
on the Internet with the hope of eventually offering online subscription/access 
to the Journal itself. What is not known at this time is how the current 
income from subscription fees will be replaced. Since the subscription fees 
are the lifeblood of the Journal, this is a very serious concern. 

Finally, Ms. Kowalczyk re-iterated her desire and willingness to aid this case 
study’s research phase, and also stated her understanding and willingness 
to work within the time constraints of the thesis completion schedule. 
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